United States Department of Labor

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

# Handbook of Methods Current Employment Statistics - State and Metro Area Calculation

## Current Employment Statistics - State and Metro Area: Calculation

The following section explains how the Current Employment Statistics-State and Area (CES-SA) program produces employment, hours, and earnings. The CES-National (CES-N) program uses the same sample and collection methods, thus references to CES apply to both CES-N and CES-SA programs.

### Monthly estimation

The Current Employment Statistics (CES) program uses a matched sample concept and weighted sample data to produce employment, hours, and earnings estimates. The methods are summarized in exhibit 1 and described in more detail below.

Exhibit 1. Summary of methods for computing estimates of employment, hours, and earnings
Employment, hours, and earnings Basic estimating cell (industry, 6-digit published level) Aggregate industry level (super sector and, where stratified, industry) Annual average data

All employees

All employee estimate for previous month multiplied by weighted ratio of all employees in current month to all employees in previous month, for sample establishments that reported for both months, plus net birth/death forecast. Sum of all employee estimates for component cells. Sum of monthly estimates divided by 12.

Average weekly hours of all employees

All employee hours divided by number of all employees. Average, weighted by all employees, of the average weekly hours for component cells. Annual total of aggregate hours (all employees multiplied by average weekly hours) divided by annual sum of all employees.

Average hourly earnings of all employees

All employee payroll divided by all employee hours. Average, weighted by aggregate hours, of the average hourly earnings for component cells. Annual total of aggregate payrolls (all employees multiplied by weekly hours and hourly earnings) divided by annual aggregate hours.

Average weekly earnings of all employees

Product of all employee average weekly hours and all employee average hourly earnings. Product of all employee average weekly hours and all employee average hourly earnings. Sum of monthly all employee aggregate payrolls divided by the sum of monthly all employees.

Production and nonsupervisory employees

All employee estimate for current month multiplied by weighted ratio of production and nonsupervisory employees to all employees in sample establishments for current month. Sum of estimates of production and nonsupervisory employees for component cells. Sum of monthly estimates divided by 12.

Average weekly hours of production and nonsupervisory employees

Production and nonsupervisory employee hours divided by number of production or nonsupervisory employees. Average, weighted by production and nonsupervisory employment, of the average weekly hours for component cells. Annual total of aggregate hours (production and nonsupervisory employment multiplied by average weekly hours) divided by annual sum of production and nonsupervisory employment.

Average hourly earnings of production and nonsupervisory employees

Total production and nonsupervisory employee payroll divided by total production and nonsupervisory employee hours. Average, weighted by aggregate hours, of the average hourly earnings for component cells. Annual total of aggregate payrolls (production and nonsupervisory employment multiplied by weekly hours and hourly earnings) divided by annual aggregate hours.

Average weekly earnings of production and nonsupervisory employees

Product of production and nonsupervisory employee average weekly hours and production and nonsupervisory employee average hourly earnings. Product of production and nonsupervisory employee average weekly hours and production and nonsupervisory employee average hourly earnings. Sum of monthly aggregate payrolls divided by the sum of monthly production employees.

Source: U.S. Bureau of Labor Statistics.

#### Stratification

Current Employment Statistics - State and Area (CES-SA) uses a cell stratification structure for each state and metropolitan area for which all employee (AE) estimates are provided. Cells within each geographic area are defined by ownership and North American Industry Classification System (NAICS) industry. They may range in detail from a high-level aggregate (e.g., goods-producing, private service-providing, or government) in small areas, to a more detailed 6-digit NAICS level, typically in statewide or larger metropolitan area estimates. Estimating cell structures may differ for production and nonsupervisory employees (PE), and hours and earnings for both AE and PE.

CES-SA uses a top-down estimation approach, where the sum of AE estimates, sometimes calculated at a detailed industry level, are constrained to values derived at an estimation super sector (ESS) level, usually defined at the 2-digit NAICS level. This allows for the publication of significant industry detail, while lessening the degree to which estimation error in cells with small sample sizes cumulate to total nonfarm. The list of potential basic ESS is provided in exhibit 2. In some states and areas, ESS listed in exhibit 2 may be combined to pool more sample for estimation purposes. Where the basic AE structure matches the ESS, separate ESS estimates may not be needed.

Exhibit 2. Estimation super sectors (ESS)
Current Employment Statistics industry code Major sector name North American Industry Classification System codes included ⁠1

10-000000

Mining and logging 1133, 21

20-000000

Construction 23

31-000000

Durable goods manufacturing 33, 32⁠2

32-000000

Nondurable goods manufacturing 31, 32⁠2

41-000000

42-000000

43-000000

Transportation, warehousing, and utilities 22, 48–49

50-000000

Information 51

55-000000

Financial activities 52, 53

60-540000

Professional, scientific, and technical services 54

60-550000

Management of companies and enterprises 55

60-560000

65-610000

Educational services 61

65-620000

Health care and social assistance 62

70-710000

Arts, entertainment, and recreation 71

70-720000

Accommodation and food services 72

80-000000

Other services 811, 812, 813

⁠1 Only establishments with private ownership are used in each ESS

⁠2 CES allocates 3-digit NAICS industries to this ESS based on industry description.

Source: U.S. Bureau of Labor Statistics.

State and local government is stratified into education and noneducation basic cells at the statewide level, except in the District of Columbia, Puerto Rico, and the U.S. Virgin Islands. Total federal government employment is a basic cell at the statewide level with the exception of three states, where it is stratified into more detailed basic cells. Total government is a basic cell in 43 metropolitan areas; total federal, state, and local government form basic cells in 325 areas, and 63 areas have other government basic structures (e.g., total federal, total state, local education, and local excluding education).

In addition to the basic estimation cells, CES-SA estimates some industry estimates independently; these series are not used for aggregating to the total nonfarm level, but may be combined with other independent estimates to form a summary estimate at an intermediate level.

Hours and earning series structures differ from structures for AE, generally providing less industry detail. All employees hours and earnings estimates are available at least at the goods-producing and private service-providing level in all 50 states and D.C., and at total private for Puerto Rico and all metropolitan areas. Hours and earnings of production employees in manufacturing is available for all states, D.C., Puerto Rico, the U.S. Virgin Islands, and 28 metropolitan areas. Additional hours and earnings detail varies by state and area.

#### Matched sample

A matched sample is defined to be all sample units that have reported data for the reference month and the month prior. Exhibit 3 shows all data that must be reported and matched for estimating each series type. Regardless of the data type being estimated, the respondent must provide AE for both months. The matched sample excludes any sample unit reporting that it is out-of-business and has zero employees. The section on birth–death model estimation more fully describes this aspect of the estimation methodology.

Exhibit 3. Data types needed for Current Employment Statistics estimation
Estimate Reported data types

All employees

All employees

Production and nonsupervisory employees

All employees, production and nonsupervisory employees

Average weekly hours of all employees

All employees, total weekly hours of all employees

Average hourly earnings of all employees

All employees, total weekly hours of all employees, total weekly payroll of all employees

Average weekly hours of production and nonsupervisory employees

Production and nonsupervisory employees, total weekly hours of production and nonsupervisory employees

Average hourly earnings of production and nonsupervisory employees

Production and nonsupervisory employees, total weekly hours of production and nonsupervisory employees, total weekly payroll of production and nonsupervisory employees

Source: U.S. Bureau of Labor Statistics.

#### Estimators for all employees

The robust weighted-link-relative estimator for AE series shown in equation 1 uses the relative change in the weighted matched sample of the estimating cell to move the previous month’s estimate forward to the current-month estimated level. A model-based component is applied to account for the net employment change resulting from business births and deaths not captured by the sample.

In some cases, a respondent’s microdata may be identified as an outlier and may receive a weight-reduction factor or be treated as atypical. This identification is generally performed by an automated procedure (described below). Atypical treatment allows CES-SA to use valid microdata to only represent the change in that particular establishment but not for changes in other establishments. In addition, strikes are treated as atypical, and other unique circumstances, such as damage to an individual establishment, may be treated as atypical as well.

Additional weight adjustments are used in certain cells where specific industries’ differential response rates are known to cause problems in estimates. Employment in religious organizations not subject to unemployment insurance (UI) laws is handled in a similar fashion to atypical employment.

The robust weighted link relative is used to estimate employment when adequate sample exists to directly estimate the relative over-the-month employment change in a given cell. A cell’s sample data is considered adequate for robust estimation when it passes disclosure tests and has either:

• a minimum of 30 average responding UIs, or
• minimum population employment of 3,000 with 50 percent average unweighted sample coverage (i.e., the industry grouping in the designated geography must employ at least 3,000, and the sample must include firms in the industry and, or geography whose total reported employment adds up to at least half of that number).

For some employment series, the sample is not adequate to use the robust weighted link relative. In these cases, a small area model is used in estimation, described in the section on special estimation methods. This model uses direct sample-based estimates of employment and variance, projections of historical (benchmarked) data, and the real-time relationship of these variables across domains to decrease volatility in estimation.

###### Equation 1. Current month estimate of all employees

$AE c ^ = AE ^ p - ∑ j ae p,j * - NCE RO × ∑ i ae c,i × w i × r i × d i ∑ i ae p,i × w i × r i × d i + ∑ j ae c,j * + b c + NCE RO$

$For all i ∈ I and j ∈ J$

where,

$= matched sample unit ;$

$j = matched sample unit where the current month is atypical ;$

$w i = weight associated with the CES report ;$

$r i = down - weight factor associated with the CES report ;$

$d i = differential response rate factor associated with the CES report ;$

$ae c,i = current month reported all employees ;$

$ae p,i = previous month reported all employees ;$

$ae c,j * = current month reported all employees where the current month is atypical ;$

$ae p,j * = previous month reported all employees where the current month is atypical ;$

$AE ^ c = current month estimated all employees ;$

$AE ^ p = previous month estimated all employees ;$

$b c = current month net birth - death forecast; an d$

$NCE RO = noncovered employment in religious organizations (NAICS 813110 )$.

The CES-SA matched sample sometimes contains a small number of reports that may have a large and adverse effect on the estimate. The influence of such reports may be because of large sample weights, a significant change in the reported employment levels, or a combination of these factors. If left untreated, influential reports may result in excessive volatility in the monthly estimates, especially at the most detailed publication levels.

The robust estimation procedure is designed to reduce the effect of the influential observations on the estimate of the relative over-the-month change. At the same time, it is recognized that the unusual and influential sample movements may represent similar behavior in the target population and a heavy intervention to the regular estimation procedure may lead to biased estimates. This is especially true if the sample is large. Therefore, the estimator is designed to reduce the volatility of the estimates that are due to extreme outlying reports while controlling the intervention to protect against the incurred bias.

### Definition of influential reports

The weighted link relative estimate is based on the ratio of weighted sums of 2 months of matched sample employment data. A scatterplot of the weighted employment reported in 2 consecutive months provides an insight on what units influence the estimate. Two examples are shown in figure 1: the survey weighted employment reported for the month (t) is plotted against the weighted employment reported for the previous month (t-1). The reference line represents the sample link relative, constructed in these examples so that the line has a slope of 1—markers on the line have constant employment.

An influential report would usually have a relatively large survey weight and/or a large change in its reported employment, and would show an unusual vertical residual from the reference line compared with other reports. Numerically, the influence of a report on the sample link relative estimate can be expressed in the form of weighted residuals.

###### Equation 2. Weighted residuals

$d i,t = w i ( ae c,i - R t ae p,i )$

where,

$d i,t = the weighted residual for unit i at month t ;$

$ae c,i and ae p,i = , respectively, a unit’s current and previous months reported employment ;$

$w i = the selection weight ;$

$R t = the sample link relative estimate in a given estimation cell identified by formula 1 .$

The influential reports are those having large positive or negative values of the weighted residuals compared with the other sample units. The extreme residuals are reduced to specific cut-off values. The cut-off values depend on the distribution of the weighted residuals in a given series and are determined independently for each month and industry series. Pushing the extreme residuals to the cut-off values is accomplished by using an appropriate weight adjustment factor.

The robust estimator addresses this problem by using a particular variation of a general method of weight reduction known as “Winsorization”.⁠1 This process identifies influential reports at the UI level for each estimating cell.

The actual cut-off values are determined by examining the relative distances of units with extreme weighted residuals to the nearest but less extreme values in the same cell and month.

The first step consists of calculating the weighted residuals using equation 2. The weighted residuals from individual establishments are aggregated to the UI account level within each estimation cell. This is done because reports within a UI account may have similar reported change in employment. Since they are similar, it is possible that none of the individual reports will be identified as an outlying unit. At the same time, the UI level residual aggregated from all the responding establishments may be extreme and very different from other responding UI accounts in the cell.

The residuals may legitimately have very different values depending on the employment size class of the sampled UI accounts. To remove the effect of the size class, the residuals are “centered” within each size class, i.e., the average of the residuals within size classes is subtracted from each original residual.

The cut-off values are determined separately for the ex­treme positive and extreme negative residuals. The proce­dures are similar for the positive and negative residuals and are described here only for the positive residuals.

First, sort the residuals in each cell in descending order. Let $d i,$ denote the i-th largest centered positive residual:

$d 1 ≥ d 2 ≥ … ≥ d n$

Set the value F1 = d1 and F2 = 2d2d1.

The general formula for Fk is:

Fk = (k + 1)dk – (d1 + d2 +…+ dk ).

Proceed with the computations of the sequence of F1,…,Fk until, at some step k, Fk ≤ 0. Typically, this point is reached after only a few steps. Next, compute the cut-off value as the point between residuals dk and dk-1 as follows:

###### Equation 3. Determining the cutoff value among residuals

$L=a ∙ d k + 1 - a ∙ d k - 1$,

where,

$a = 0.8 F k - 1 (0.8 F k - 1 - 0.2 F k )$

The initial adjustments for units whose residuals are greater than L are

The cut-off values defined using the described procedure are always placed between some neighboring ordered residuals, so that all the residuals to the right of a cut-off value are greater than the cut-off value, although they may be very close to it. Based on the historical CES estimates, it has been found that the following rules for determining the final adjustments work the best.

• For certainty units: if the initial adjustment is less than 0.5, then a unit is declared to be atypical, representing only itself. The atypical data is removed from the matched sample set and is not used in the estimation of the sample link relative.
• If the initial adjustment for a certainty unit is greater or equal to 0.5, no intervention is required and the adjustment is reset to 1.
• For noncertainty units: if the initial adjustment is less than or equal to 0.3, then a unit is declared atypical and it is not used in the sample link relative estimation. If the initial adjustment is greater than 0.4, then the final adjustment is reset to 1.
• Adjustments between 0.3 and 0.4 are applied to the sample weight. The final adjusted weight is required to be equal or greater than 1. For example, if the resulting adjusted weight falls below 1, the final weight is reset to 1.

An intervention, such as a weight adjustment, into the regular estimation procedure would reduce the variance of the estimate but it may introduce a bias. Therefore, the in­tervention, especially in samples of moderate to large sizes, is done with caution. For example, it is possible that there exist units in the non-sampled part of the population that are similar to the influential observations in the sample. Moderating the effect of the sample’s influential units may lead to a reduction in the representativeness of the sample. Since the nonsampled part of the population is not available, it is difficult to judge the amount and the need of intervention based only on the observed sample. One way to protect against unwarranted intervention is to verify its necessity using historical CES estimates. If the estimate falls within the historically observed bounds, then the intervention is deemed unnecessary and the weight adjustments are discarded.

At the very first step of the procedure, the sample link relative estimate is used when defining the residuals. This estimate may itself be affected by the extreme influential observations. Therefore, the whole procedure is performed a second time. The atypical units determined during the initial run are not used in calculating the adjustment factors during the second run.

The reports identified by the robust estimation techniques are treated as atypical in the link-relative technique, while all other matched sample responses are treated as typical in the link-relative technique.

### Accounting for large events outside matched sample

The probability-based survey design, weighted link relative estimator, and net birth-death model (described below) account for the vast majority of employment change from business expansions, contractions, openings, and closings. Exceptionally large, known employment changes events are accounted for in CES-SA when they are not adequately captured in the matched sample and birth–death model.

The Bureau of Labor Statistics (BLS) works with state workforce agencies (SWA) to investigate publicly reported non-sample employment events, such as large business births, deaths, and strikes. This includes confirming the number of jobs affected and the event’s timing in relation to the CES reference period. If confirmed, estimates are adjusted by the portion of the event not captured by usual estimation procedures.

Non-response from certainty units that historically exhibit consistent, seasonal differences from the rest of the population—Quarterly Census of Employment and Wages (QCEW)—is accounted for through imputation in the matched sample. If an establishment reported CES data for the prior month (t-1) but not the current month (t), the matched sample uses the reported prior month CES data. Current month data is imputed by multiplying the reported prior month by the relative employment change in the prior year population data. If CES data was not reported in either the current or prior month, QCEW data is used to impute the level for both on the matched sample. To capture the seasonal movement of the noncovered employment portion for nonrespondents in education, prior year CES reported data is used for the imputation if available, otherwise QCEW data is also used. Since key nonrespondents are by definition different from the rest of the population, they are treated as atypical, and therefore only represent themselves in the estimates.

###### Equation 4. Key nonrespondent imputation

$CES ~ t = times; , CES ~ t - 1 = QCEW t - 13 , CES t - 1 not reported$

where,

$CES = CES employment ;$

$QCEW = QCEW employment;$

$t = time (year/month); an d$

$~ = designation for imputed data .$

Overall, adjustments to account for nonsample events and key nonrespondents largely offset at the statewide total nonfarm level over the course of a given year. However, these adjustments may bring estimates of monthly employment change at the detailed industry and metropolitan area level substantially closer to population values.

### Special estimation situations

When a cell fails sample adequacy requirements, its direct sample-based estimates can experience high levels of relative variance. Though the cell’s robust sample-based employment estimate still approximately unbiased for the target population value, its high variance can disguise true over-the-month movements. In these cases, employment is estimated by using models that trade off some of the immediacy and geographic focus of purely probability-based estimates for improved estimate stability.

### Small area model generation 3 (SAM Gen3)

Beginning with the release of January 2022 estimates, CES-SA implemented its third-generation small area model for estimating employment in domains with insufficient sample for robust, sample-based estimation. The first generation of small area models was implemented in 2003; the second generation, based on the Fay-Herriot model, was introduced in 2009.⁠2 The SAM Gen3 serves as a generalization of the Fay-Herriot model and loosens many of its assumptions.⁠3

###### Equation 5. Small Area Model Generation 3 link

$Y ^ = m k + Xβ + u st + u d$

where,

$Y ^ = SAM Gen 3 model link ;$$Y ^ = SAM Gen 3 model link ;$

$m k = intercepts derived by an unsupervised clustering process ;$

$X = matrix of predictor variables ;$

$β = vector of coefficients for intercepts and slopes;$

$u st = state random intercepts; an d$

$u d = domain random intercepts .$

Direct, sample-based estimates ($Y ^ 1$$Y ^ 1$$Y ^ 1$$Y ^ 1$) are regressed on the historical 5-year same-month employment trend ($Y ^ 2$$Y ^ 2$). A regression tree is run on the residuals from the linear regression $Y ^ 1 = a + Y ^ 2 β ~ + e$ to provide a set of groupings. These groupings, along with the  values, and potentially additional information, are included in the $X$$X$ matrix.

The SAM Gen 3 co-models point estimates and their variances. The variance estimates used as a model input are derived using repeatedly grouped balanced half-samples (RGBHS) replication method on the current month matched sample. A structure similar to a generalized variance function (GVF) is used to describe the model for variances in SAM Gen3. For more information on the RGBHS and GVF, see the section on variance estimation.

The resulting model-based estimate can be presented as a weighted average of the direct  estimate, state’s average estimate, and the regression-based (“synthetic”) part.

###### Equation 6. Alternate Small Area Model Generation 3 link

$Y ^ = W 1 * Y ^ 1 + W 2 * Y ST ADJ + W 3 * Y SYN$

where,

$Y ST ADJ = Y ̅ 1 - ( Y SYN - Y ̅ SYN )$;

$Y SYN = m k +Xβ;$

$Y ̅ 1 , Y ̅ SYN$ = weighted state averages of  and $Y SYN$;

$W i ≥ 0;and$

$W 1 + W 2 + W 3 =1.$

The model shrinks the estimate from the domain’s direct sample based  toward respective state’s average and synthetic estimates. The amount of shrinkage depends on the variance of the direct estimate relative to variability of the other components.

The joint model is estimated using variational Bayes in Stan software. The estimation process includes an unsupervised clustering algorithm to detect structure in the data not defined by the groupings in the  matrix and state random effects. Intercepts are derived for each domain based on their probability of being in each cluster.

For all data types other than AE, CES-SA uses the robust weighted-difference-link-and-taper estimator applied to the matched sample. The difference link accounts for the over-the-month change in the sampled units, while the tapering feature keeps the estimates close to the overall sample average over time by using a composite base derived from the prior month’s estimate and matched sample average. The taper is considered a level correction because no benchmark source exists for non-AE data types. This estimator promotes continuity by heavily favoring the estimate for the previous month’s estimate (usually by 90 percent). Reported microdata may be identified as atypical, whereby the atypical microdata is subtracted before the base and matched samples, then added back to represent the individual establishment only.

#### Production and nonsupervisory employees

CES-SA estimates a ratio of PE to AE (PER) using the weighted-difference-link-and-taper formula. The resulting ratio is then multiplied by the current month’s AE estimate to obtain a current estimate of PE.

###### Equation 7. Production and nonsupervisory employees

$PE ^ c = (( AE C ^ - ∑ J ae c,j * ) × PER c ) ^ + ∑ j pe c,j *$

where,

$PER c = ^ ( ∝ × PER p ) ^ + β$

$× ∑ i (w i × r i × pe p,i ) - ∑ j (w j × pe p,j * ) ∑ i (w i × r i × ae p,i ) - ∑ j ( w j × ae p,j * ) + ∑ i (w i × r i × pe c,i ) - ∑ j (w j × pe c,j * ) ∑ i (w i × r i × ae c,i ) - ∑ j (w j × ae c,j * )$

$- ∑ i (w i × r i × pe p,i ) - ∑ j (w j × pe p,j * ) ∑ i (w i × r i × ae p,i ) - ∑ j (w j × ae p,j * )$

Variables used in equations 7 to 8 are defined below equation 9.

#### Hours and Earnings series

The CES-SA hours and earnings series are derived from reports of payrolls and the corresponding paid hours for all employees and for various types of production employees. Hours and earnings are for private-sector employees.

#### Average weekly hours

Estimates of average weekly hours (AWH), shown in equation 8, use a composite base of the prior month’s AWH estimate and sample ratio of total work hours-to-AE and apply the change in the sample average ratios to set the over-the-month change.

The estimator for average weekly hours for production and nonsupervisory employees takes the same form as average weekly hours for all employees, where AE and AWH represent estimates of PE and AWH for PE, respectively, and the matched sample totals ae and wh represent matched sample totals for production employees and weekly hours for production and nonsupervisory employees, respectively.

###### Equation 8. Average weekly hours

$AWH c = ^ α × AWH p ^ + β$

$∑ i w i × r i × wh p,i - ∑ j w j × wh p,j * ∑ i w i × r i × ae p,i - ∑ j w j × ae p,j * × ( AE p ^ - ∑ j ae p,j * ) + ∑ j wh p,j * AE p ^$

$+ ∑ i w i × r i × wh c,i - ∑ j w j × wh c,j * ∑ i w i × r i × ae c,i - ∑ j w j × ae c,j * × ( AE c ^ - ∑ j ae c,j * ) + ∑ j wh c,j * AE c ^$

$- ∑ i w i × r i × wh p,i - ∑ j (w j × wh p,j * ) ∑ i w i × r i × ae p,i - ∑ j w j × ae p,j * × ( AE p ^ - ∑ j ae p,j * ) + ∑ j wh p,j * AE p ^$

#### Average hourly earnings

Estimates of average hourly earnings (AHE), shown in equation 9, use a composite base of the prior month’s AHE estimate and sample ratio of total weekly payroll-to-total weekly hours and apply the change in the sample average ratios to set the over-the-month change.

The estimator for average hourly earnings for production and nonsupervisory employees (PE-AHE) takes the same form as average hourly earnings for all employees (AE-AHE), where AE, AWH, and AHE represent estimates of PE and their hours and earnings, and the matched sample totals pr and wh represent matched sample totals of payroll and work hours for PE.

###### Equation 9. Average hourly earnings

$AHE c ^ = α × AHE p ^ + β$

$× ∑ i w i × r i × pr p,i - ∑ j w j × pr p,j * ∑ i w i × r i × wh p,i - ∑ j w j × wh p,j * × ( WH p ^ - ∑ j wh p,j * ) + ∑ j pr p,j * WH p ^$

$+ ∑ i w i × r i × pr c,i - ∑ j w j × pr c,j * ∑ i w i × r i × wh c,i - ∑ j w j × wh c,j * × ( WH c ^ - ∑ j wh c,j * ) + ∑ j pr c,j * WH c ^$

$- ∑ i w i × r i × pr p,i - ∑ j (w j × pr p,j * ) ∑ i (w i × r i × wh p,i ) - ∑ j (w j × wh p,j * ) × ( WH p ^ - ∑ j wh p,j * ) + ∑ j pr p,j * WH p ^$

Listed below are the variable definitions for equations 7–9 where,

$i = a matched CES report ;$

$= the set of all matched CES reports ;$

$j = a matched CES report where the current month is atypical ;$

$J = the set of all matched CES reports where the current month is atypical (Note that J is a subset of I ) ;$

$* = indicates an atypical matched CES report ;$

$α = 0.9 ;$

$β = 0.1 ;$

$c = indicates current month sample or estimate ;$

$p = indicates previous month sample or estimate ;$

$w = weight associated with a CES report ;$

$r = down - weight factor associated with the CES report ;$

$ae = reported all employees ;$

$pe = reported production and nonsupervisory employees ;$

$AE ^ = estimated employment for all employees (or production and nonsupervisory if PE) ;$

$AWH ^ = estimated average weekly hours for all employees (or production and nonsupervisory$

$e mployees when estimating PE hours) ;$

$AHE ^ = estimated average hourly earnings for all employees (or production and nonsupervisory$

$employees when estimating PE earnings) ;$

$PER ^ = estimated ratio of production and nonsupervisory employees to all employees ;$

$wh = reported weekly hours for all employees (or production and nonsupervisory employees whe n$

$estimating PE hours);$

$pr = reported weekly payroll for all employees (or production and nonsupervisory employees when$

$estimating PE earnings) ;$

$WH ^ = estimated aggregate weekly hours for all employees or production and nonsupervisory employees$

$derived from estimates of average weekly hours and employment ;$

$PR ^ = estimated aggregate weekly payrolls for all employees or production and nonsupervisory employees$

$derived from estimates of average weekly hours, average hourly earnings, and employment; an d$

$b = net birth–death forecast for the current month .$

For all variables used in the equations above:

• All estimated values are shown in upper case.
• All sample measures are shown in lower case and are based on a matched sample.

### Definition of influential reports in non-AE Estimates

The process for selecting influential reports in non-AE estimates and adjusting their weights is closely related to the process for AE estimates, described in the section “Estimators for all employees,” and is used in estimating the number of production employees, as well as average weekly hours and average hourly earnings for both all employees and production employees.

The main difference between the AE and non-AE procedures lies in the measure of influence used. For AE estimates, the influence of a given respondent is considered to be the difference between prior month employment multiplied by the cell’s current sample link and current month reported employment, multiplied by the respondent’s selection weight. This approach is not appropriate for the weighted link and taper estimator used for non-AE. For non-AE, the influence is measured by recalculating the weighted link and taper estimate iteratively, removing each respondent one at a time. The difference between the estimate calculated with all respondents and the estimate calculated without a given respondent is considered to be that respondent’s influence.

###### Equation 10. Influence function for non-ae estimates

$IF k = θ ^ - θ ^ - k$

where,

$IF k = the influence function for respondent k ;$

$θ ^ = full - sample estimate; an d$

$θ ^ - k = estimate calculated excluding respondent k .$

Beyond differences in influence functions, the non-AE weight adjustment algorithm diverges from the one used in AE in two ways. First, the adjustment factor used in selecting certainty units as atypical is 0.5 instead of 0.4. Second, the formula for calculating the cut-off point, from which adjustment factors are calculated, is closer to the second most influential positive and negative respondents: a is defined as 0.9F1/(0.9F1 – 0.1F2) instead of 0.8F1/(0.8F1 – 0.2F2). Both these modifications were chosen through empirical evaluation of simulated data.⁠4

The CES sample alone is not sufficient for estimating the total employment level because each month new firms open for business, generating employment that cannot be captured by the sample. There is an unavoidable lag between a firm opening for business and its appearance on the CES sample frame, which is built from unemployment insurance (UI) quarterly tax records. Although these records cover nearly all U.S. employers and include business births, they only become available for updating the CES sampling frame 7 to 9 months after the reference month. After establishment births appear on the frame, additional time is required for sampling, enrolling, and collecting payroll data. In practice, CES cannot sample and begin to collect data from new firms until they are at least a year old.

There is a somewhat different issue in capturing employment loss from business deaths through monthly sample collection. Businesses that have closed are unlikely to respond to the survey, and data collectors may not be able to ascertain until after the monthly collection period that firms have in fact gone out of business. As with business births, confirmation of business deaths eventually becomes available from the lagged UI tax records.

Difficulty in capturing information from business birth and death units is not unique to the CES; virtually all current business surveys face these limitations. CES adjusts for these limitations explicitly, using a statistical modeling technique in conjunction with the sample for estimating employment for private-sector industries. Without the net birth-death model-based adjustment, the CES nonfarm payroll employment estimates would be considerably less accurate.

#### Net birth-death model

The CES-SA program separates the net birth-death methodology into two steps. The process applies only to nonfarm industries in the private sector.

##### Step one

The first step is to exclude from the sample employment losses from business deaths to offset a portion of employment gains from business births. Research and data have shown that employment increases from firm births roughly offset employment decreases from firm deaths in most months; therefore, step one accounts for most of the net employment change from business births and deaths.

To accomplish step one, CES-SA excludes reports with zero employment from the sample. Nonresponding sample units are automatically excluded from the matched sample, because they have no data for the reference month. These exclusions from the matched sample result in an over-the-month change that is based solely on employment from continuing businesses and effectively keep employment from business deaths in the estimates to represent employment from business births that occur after the sample was selected.

Step one accounts for most of the birth-death employment, but not all of it.

##### Step two

The second step is to model the net birth-death employment residual not accounted for in step one. The CES-SA program adjusts its sample-based estimates for the net birth-death employment missed by step one using autoregressive integrated moving average (ARIMA) modeling, an econometric technique often used to analyze and forecast time series. From the Quarterly Census of Employment and Wages (QCEW) universe employment series, each establishment is classified as a continuing, birth, or death unit. The process outlined in step one is applied to the QCEW data. Over-the-month changes are calculated from the continuing units, and deaths are used to impute for births. The differences between simulated estimates and actual employment totals measured by the QCEW each month are input into the ARIMA model to derive the net birth-death employment forecast.

The CES-SA program updates the net birth-death inputs to the ARIMA model each year for all statewide 3-digit NAICS industries as part of its annual benchmarking process. An additional 3 months of net birth-death data are appended to the input dataset as each new quarter of QCEW data becomes available.

The inputs to the ARIMA model depend on the age of the selected sample, which is selected in 1 year and rotated into production by industry sector and quarter over the following year. Quarterly sample rotation (see data sources section) results in differently aged samples by industry sector and has an inherent effect on birth-death values. To control for sample age, net birth-death forecasts are based on either 1 year out or 2 years out from selection. Exhibit 4 shows which sample is used to produce estimates for the first quarter of year t through the fourth quarter of year t. In the first quarter of year t, estimates for all industries are based on sample selected in year t-1, and all net birth-death forecasts are based on births and deaths in the first year after selection. In the second quarter of year t, sample selected in year t-1 is used for mining and logging; for trade, transportation, and utilities; and for financial activities; the net birth-death forecasts for these industries are based on input data 1 year after the sample was selected. All other industries are based on the prior sample year t-2 and require a forecast based on net birth-deaths in the second year after selection. In third quarter of year t, two additional industries—construction and leisure and hospitality—are rotated to the year t-1 sample and require a forecast for 1 year out, and the remaining industries on year t-2 samples require a forecast for 2 years out. This process repeats with more industries rotating in on the newer sample and requiring the 1-year out forecast until first quarter of year t+1 when all private-sector industries are based on year t-1 sample.

Exhibit 4. Sample rotation by industry and quarter for year t
Industry Major industry Second quarter, Yt Third quarter, Yt Fourth quarter, Yt First quarter, Yt+1

Mining and logging

10 Year t-1 sample Year t-1 sample Year t-1 sample Year t-1 sample

40

Financial activities

55

Construction

20 Year t-2 sample

Leisure and hospitality

70

Information

50 Year t-2 sample

60

Other services

80

Manufacturing

30 Year t-2 sample

Education and health services

65

Source: U.S. Bureau of Labor Statistics.

For each statewide 3-digit NAICS industry, the most recent 6 years of employment data from the longitudinal database (LDB) is partitioned into five 25-month frames plus one half-frame. The additional half-frame is necessary in order to compile 13 months of data to derive 12 months of over-the-month changes.

Exhibit 5. Frame composition
Frame First month of span Last month of span

Frame 1 (half-frame)

March year t-1 March year t

Frame 2

March year t-2 March year t

Frame 3

March year t-3 March year t-1

Frame 4

March year t-4 March year t-2

Frame 5

March year t-5 March year t-3

Frame 6

March year t-6 March year t-4

Source: U.S. Bureau of Labor Statistics.

#### Forecasting net birth-death employment

The birth-death residuals are calculated working with the frames and treating each microdata unit in the same way that the CES-SA treats microdata. Starting with the QCEW microdata for March t-6, microdata is separated into groups of continuous units, death units, and birth units for each reference month from April t-6 through March t-5 as follows:

1. sum all employees from the QCEW microdata for each month to obtain total population employment for each industry by month;
2. continuous units, for each reference month, include all units with employment greater than zero in each month from March t-6 through the reference month. Sum continuous employment for each reference month and sum the matched employment for the previous month. Next calculate a link relative for that reference month by dividing total continuous employment in the reference month by matched employment from the prior month;
3. death units include all units with zero employment in the reference month but with employment greater than zero in each month prior to the reference month. For each death unit, multiply the prior month’s employment by the link relative from the continuous units to get imputed employment for the reference month. This process closely mirrors CES estimation methods, with the relative over-the-month change of continuing units applied to death employment to impute for business births;
4. discard the remaining birth units; and
5. use equation 11 (below) to calculate net birth-death values for each reference month. For each reference month, calculate net birth-death values as shown by equation 11, which is total population employment less the sum of continuous employment and imputed employment.
###### Equation 11. Net birth-death

Net birth-death value = population employment − (continuous employment + imputed employment)

The residual net birth-death values within the first and second halves of each frame are chained together to form separate time series (one comprised solely of first-half of each year of data, the other of second-half of each year of data).

Continuous samples deteriorate over time as firm deaths are not replenished by births. So, while the older samples continue to be used for some industries, the net birth-death inputs must be aged with the inputs identified in the second year after selection using the same process as the forecast for the 1-year-old sample, but aged 1 year.

Over-the-month changes of all net birth-death values are input into the forecasting model for the 1-year-old sample and the 2-year-old sample separately. The resulting forecasts are selected for use in estimation based on the age of the sample used for a given industry and quarter.

#### Raking and distribution of birth-death forecasts

The difference between the cumulative state birth-death forecasts in a given month is raked to the forecasts derived at the national level by the CES-N program, as shown in equation 12. The difference is distributed to the state-level forecasts proportional to its employment level compared to other states.

###### Equation 12. Raking birth-death factors

$BD i, m, t RAKED = BD i , m, t UNRAKED + BD m, t NATIONAL - ∑ i = 1 51 BD i, m, t UNRAKED × EMP i, m, t ∑ i = 1 51 EMP i, m, t$

where,

$BD UNRAKED = ARIMA model - based state - level birth - death forecast ;$

$BD NATIONAL = national level birth - death forecast ;$

$BD RAKED = raked state - level birth - death forecast for use in estimation ;$

$EMP = QCEW - derived employment value ;$

$i = state ;$

$m = 3 - digit NAICS industry; an d$

$t = time (year/month) .$

Raking is done to ensure that birth-death forecasts are not a source of divergence between the sum-of-state and national employment data. In addition, national forecasts are updated each quarter with the availability of additional QCEW data, and raking is performed each quarter to make use of the updated national values.

After raking, birth-death factors are distributed to detailed industry levels and metropolitan areas proportional to their employment level. For example, if employment in NAICS 722 at the statewide level is 50,000, its raked forecast in month t is 700, and NAICS 722511 employment in a Metropolitan Statistical Areas (MSA) in the same state is 5,000, its raked forecast in month t is: $700 × 5,000 50,000 = 70$.

For estimating cells comprising multiple industries, birth-death factors for the component industries are summed for use in estimation after the raking and distribution procedures.

### Aggregation Procedures

CES-SA estimates are separated into five types of estimating cells: basic, summary independent, independent basic, and independent summary.

• Basic estimates are calculated from the matched sample at an industry detail up to the 6-digit NAICS level (though less detailed in most cases). Some industries are composed of more than one NAICS-defined industry to ensure that all industries are included and that estimating cells have adequate sample coverage. Basic cells are then aggregated to higher levels called summary cells.
• Summary estimates are calculated from basic and/or other summary cells.
• Independent estimates are calculated from the matched sample like basic estimates; however, they do not aggregate to higher levels.
• Independent-basic cells are calculated from the matched sample and aggregate to higher levels, but not total nonfarm or total private.
• Independent-summary cells are calculated from independent-basic and/or other independent-summary cells, but do not aggregate to total nonfarm or total private.

Aggregation starts at the most detailed industry-ownership level in each state and area, up to the 6-digit NAICS. All 6-digit industries that begin with the same 5 digits are aggregated to that 5-digit NAICS. All 5-digit industries that begin with the same 4 digits are then aggregated to that 4-digit industries. This process continues with all detailed industries falling into 1 of 17 major industry sectors, which are further aggregated into 10 supersectors, then goods-producing, private service-providing industries, and finally, to total private and total nonfarm. (See exhibit 5 and 6.) Total nonfarm represents the highest summary estimate for AE in all states and areas, while total private represents the highest possible summary estimate for all other data types.

Exhibit 6. Major industry sectors
Current Employment Statistics industry code Major sector name North American Industry Classification System codes included and type of ownership

10-000000

Mining and logging 1133, 21 / Private

20-000000

Construction 23 / Private

31-000000

Durable goods manufacturing 33, 32⁠1 / Private

32-000000

Nondurable goods manufacturing 31, 32⁠1 / Private

41-420000

42-000000

43-000000

Transportation and warehousing 48–49 / Private

44-220000

Utilities 22 / Private

50-000000

Information 51 / Private

55-000000

Financial activities 52,53 / Private

60-000000

Professional and business services 54,55,56 / Private

65-000000

Education and health services 61,62 / Private

70-000000

Leisure and hospitality 71,72 / Private

80-000000

Other services 811,812,813 / Private

90-910000

Federal government All in-scope NAICS / Federal government

90-920000

State government All in-scope NAICS / State government

90-930000

Local government All in-scope NAICS / Local government

⁠1 CES allocates 3-digit NAICS industries to this major industry sector based on industry description.

Source: U.S. Bureau of Labor Statistics.

Exhibit 7. Aggregate industry sectors
Current Employment Statistics industry code Aggregate Sector Name Sectors included

00-000000

Total nonfarm 05-000000 Total private, 90-000000 Government

05-000000

Total private 06-000000 Goods-producing, 08-000000 Private service-providing

06-000000

Goods-producing 10-000000 Mining and logging, 20-000000 Construction, 30-000000 Manufacturing

07-000000

Service-providing 40-000000 Trade, transportation, and utilities, 50-000000 Information, 55-000000 Financial activities, 60-000000 Professional and business services, 65-000000 Education and health services, 70-000000 Leisure and hospitality, 80-000000 Other services, 90-000000 Government

08-000000

Private service-providing 40-000000 Trade, transportation, and utilities, 50-000000 Information, 55-000000 Financial activities, 60-000000 Professional and business services, 65-000000 Education and health services, 70-000000 Leisure and hospitality, 80-000000 Other services

30-000000

Manufacturing 31-000000 Durable goods, 32-000000 Nondurable goods

40-000000

90-000000

Government 90-910000 Federal government, 90-920000 State government, 90-930000 Local government

Source: U.S. Bureau of Labor Statistics.

As noted in the section on stratification, CES-SA uses a top-down estimation approach where the sum of AE estimates are constrained to values derived from a parallel estimation super sector (ESS). After adjustments are made to AE data to control their summation to the ESS, AE data are re-aggregated for publication.

State and area structures may include series that represent summations of industries not found in NAICS—including total private; mining, logging, and construction; goods-producing; private service-providing; and government. These series may be basic or summary cells and are incorporated into the aggregation structure.

##### Employment

All employee (AE) and production and nonsupervisory employee (PE) data types use the same method for aggregation. Basic level estimates rounded to the hundreds are summed to the next higher summary-level estimate and then rounded according to the published precision. The process repeats at each level as data are aggregated to higher level summaries.

##### Average weekly hours

Average weekly hours are published in hours rounded to the tenths place, and rounded values are used in aggregation. Aggregate or summary levels of average weekly hours (AWH) are weighted by employment in component industries. Estimates of AWH at the basic levels are multiplied by employment estimates to calculate aggregate hours. Aggregate hours are summed for the basic industries and then divided by their summed employment. The process repeats at each level as data are aggregated to higher level summaries. The aggregation method for AWH of AE and PE is identical with the appropriate substitution of AE or PE employment and hours data in equation 13.

###### Equation 13. Summary level average weekly hours

$AWH j = ∑ i,j AWH i,j × Emp i,j ∑ i,j Emp i,j$

where,

$AWH i,j = average weekly hours of industry i in summary industry j ;$

$AWH j = average weekly hours of summary industry j ; an d$

$Emp i,j = employment of industry i in summary industry j .$$Emp i,j = employment of industry i in summary industry j .$

##### Average hourly earnings

Average hourly earnings and average weekly earnings are published in dollars rounded to the cent, and rounded values are used in aggregation. Estimates of average hourly earnings (AHE) at the basic level are multiplied by estimates of employment and by average weekly hours to calculate aggregate payroll, which is then summed to the next higher summary industry. The summed payroll data are then divided by the sum of aggregate hours (employment times average weekly hours) of all industries in the summary industry. The aggregation method of AHE for AE and PE is identical with the appropriate substitutions of AE or PE employment, hours, and earnings values in equation 14.

###### Equation 14. Summary level average hourly earnings

$AHE j = ∑ i,j AHE i,j × AWH i,j × Emp i,j ∑ i,j AWH i,j × Emp i,j$

where,

$AHE j = average hourly earnings of summary industry j ;$

$AHE i,j = average hourly earnings of industry i in summary industry j ;$

$AWH i,j = average weekly hours of industry i in summary industry j ; an d$

### Benchmark

Annually, the CES-SA benchmarks, or re-anchors, the sample-based employment estimates to the universe counts derived principally from the QCEW program. This benchmark process is done to account for statistical sampling and non-sampling errors (including response, nonresponse, and birth-death) that accumulate over the year. The QCEW counts provide a point-in-time census of payroll employment: they are not subject to sampling error and directly capture job growth from business births and deaths. However, the QCEW data are not as timely as the sample-based estimates and are subject to some forms of non-sampling error.

With each annual benchmark, the standard practice for state and area series is to revise 20 months of not seasonally adjusted data before the normal monthly estimation processes begin on the new levels. For example, with the development of the 2021 benchmark, levels were reestablished for the April 2020 through September 2021 reference months. UI tax reports are not collected on a timely enough basis to replace CES payroll estimates for the fourth quarter, October 2021 to December 2021. For this period, estimates are revised using the new September 2021 series level derived from the census employment counts. New sample-based estimates are developed from those levels that incorporate updated business birth/death factors and new or revised microdata.

A snapshot of the QCEW is the starting point for building the CES benchmark level. Employment that is covered under the CES definition, but not by state unemployment insurance (UI) tax laws is added to these levels. Approximately 97 percent of total nonfarm employment within the scope of the establishment survey is covered by UI. The remaining 3 percent, called noncovered employment (NCE), is present only in select industries, including employees such as railroad workers, insurance agents, student workers, and clergy. State workforce agencies (SWAs) are required to provide NCE counts to CES. SWAs mainly obtain NCE counts from the Railroad Retirement Board (RRB), the Surface Transportation Board (STB), and U.S. Census County Business Patterns, but some states conduct surveys and/or use other administrative data.

The total of the QCEW and noncovered employment (referred to as the population) replaces sample-based estimates for all state and metropolitan series. For benchmark year 2021, the employment levels for April 2020 through the September 2021 were replaced by these population counts. October and November 2021 re-estimates were generated as sample-based estimates linked from the new September 2021 levels. The links used for re-estimates may differ slightly from those used to derive preliminary estimates, because they include data from respondents who reported too late for inclusion in the previously published estimates, updated microdata, as well as new birth-death factors and model inputs. Figure 2 illustrates the timing of the benchmark replacement period and re-estimates for a given benchmark year y.

Figure 2. Illustration of benchmark replacement period
Year y-1
Replacement period
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Year y
Replacement period Re-estimates
JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
Source: U.S. Bureau of Labor Statistics.

The first 6 months of each benchmark replace the last 6 months of the prior benchmark’s population counts. The next 3 months replace the prior benchmark’s re-estimates, while the remaining 9 months replace sample-based estimates linked from the last benchmark. Those first 6 months are updated with more recent versions of the QCEW and new noncovered employment counts. For the 2021 benchmark (April 2020 through September 2021), estimates are replaced once as part of the 2021 benchmark process. The April 2021 through September 2021 employment data are replaced again a year later as part of the 2022 benchmark process.

In the process of replacing the employment estimates, CES evaluates known administrative issues in the QCEW. The presence of noneconomic code changes (NECCs) is one such issue. Each year approximately one-third of all establishments in the QCEW are contacted as part of the Annual Refiling Survey (ARS). State workforce agencies contact companies and ask them to verify their North American Industry Classification System (NAICS) industry classification, location, and ownership. Ownership can be private industry; or federal, state, or local government. Based on ARS responses, updates are made to the QCEW data with first-quarter data. These administrative changes are referred to as NECCs. The effect of changes that represent less than 6 percent of the employment in a series is distributed across 12 months. This distribution is called a wedge.

###### Equation 15. Net noneconomic code changes wedge adjustment

$Wedge j = times; times;$

where,

$NECC Net = net employment moving to series because of noneconomic code changes ;$

$Wedge = monthly adjustment to benchmark employment for net NECC; an d$

$j = number of months into the replacement period. For benchmark year t , April t-1, j = 1 .$

For a positive net NECC, employment is gradually added into the benchmark series from April through December, but subtracted in January and February, since the code change is implemented in the first quarter. The reverse is true for a negative net NECC. March reflects the full effect of the NECC.

For changes of 6 percent or more, CES staff lengthens the number of months across which the employment change is distributed. This requires editing historical data, which for benchmark year 2021 would be months prior to April 2020. In some cases, the time series is reconstructed using historical QCEW data instead of a wedge. If an establishment has a large NECC, CES staff can add its historical employment values to the series it should have been in all along, throughout the series history. This employment is then be subtracted from the series it should not have been in, again throughout series history. This in effect is a correction to historical data for an error in coding (industry, geographic area, and ownership), and makes the history more accurate.

Wedges are also used to address series breaks in the presence of other noneconomic changes in the QCEW data, for example a company improving their reporting detail from a statewide aggregate to individual worksite levels. Noneconomic changes in noncovered employment values, such as new calculation methods and changes based on noisy source data, are also wedged back in history.

Another type of adjustment is applied to most local government series which have an educational component. CES treats faculty that are not paid during the summer months as employed, which is not the case in the QCEW. Large drops in reported faculty during the summer months are added back to CES microdata values. To bring the benchmarked series in line with CES definitions of faculty employment, the monthly sample-based estimates of employment change are used to fill in benchmark employment for June through August.

###### Equation 16. Local government education summer adjustment

$BMK j = BMK May × ∏ i = 1 j θ ^ i + j 4 × BMK Sept . - BMK May × ∏ i = 1 4 θ ^ i$

where,

$BMK = benchmark employment level ;$

$θ ^ = CES link relative estimate; an d$

$i, j = number of months since May (1 = June, 2 = July, 3 = August, 4 = September) .$

Following the revision of employment estimates, estimates for production and nonsupervisory employees are recomputed using the revised all-employee estimates and the previously computed sample ratios of these workers to all employees.

#### Noncovered employment

Noncovered employment results from a difference in scope between the CES-SA program and the QCEW program, whose employment counts are derived from UI tax reports filed by individual firms. While most firms are required to pay UI tax for their employees, some types of employees are exempt from their states’ UI tax laws but are still within scope for the CES survey. Examples of the types of employees that may be exempt are: college students paid by their school as part of a work study program; interns of hospitals paid by the hospital for which they work; employees paid by state and local government and elected officials; independent or contract insurance agents; employees of nonprofits and religious organizations (the largest group of employees not covered); and railroad employees covered under a different system of UI administered by the RRB. This employment needs to be accounted for in order to set the benchmark level for CES employment.

State workforce agencies (SWAs) are responsible for providing monthly counts of noncovered employment to BLS as part of the annual benchmarking process. Laws on UI coverage vary from state to state, and SWAs review changes to these laws each year in order to identify industries where noncovered employment may be present. In several states, corporate officers are exempt from UI coverage, and, as a result, noncovered employment exists in most NAICS industries in those states. Noncovered values for corporate officers in Idaho, North Dakota, Oregon, and Washington, and other state-specific UI exemptions are used in benchmarking the CES national as well as state and area estimates.

No single source of noncovered data exists; therefore, SWAs use several sources to generate the employment counts, including County Business Patterns (CBP) and the Annual Survey of Public Employment and Payroll (ASPEP), both from the US Census Bureau; the RRB; state-conducted surveys, and other administrative data available to the state.

CES also develops national noncovered levels for March of the benchmark year, extrapolates these levels to each state (based on the state submitted values during the previous year), and provides the state-specific levels to the SWAs, as shown in equation 17.

###### Equation 17. Extrapolation of noncovered employment from national to state levels

$N s,i,t = R s, i, t - 1 ∑ s = 1 51 R s, i, t - 1 × N i, t$

where,

$N = CES National noncovered employment ;$

$R = State - level noncovered employment ;$

$s = state ;$

$i = industry; an d$

$t = year .$

These values are used in benchmarking state and area employment when more accurate values are unavailable. For information on how the CES national noncovered values are derived, see calculation section of the CES-N Handbook of Methods.

#### Development of new series

The primary source used in constructing new CES-SA employment time series is the BLS longitudinal database (LDB), which consists of establishment-level microdata from the QCEW and represents all employment covered by the UI system. The LDB contains the state, county, township, ownership (private industry; or federal, state, or local government), and NAICS codes that were assigned to each establishment in a given quarter. The LDB connects businesses reporting to the UI system across time in two ways that aid in reconstructing employment time series. First, establishments that changed UI account numbers but represent the same business location are linked together with a common identifier (a unique “LDB number” for each establishment). Second, the LDB tracks more complicated predecessor–successor relationships where changes in reporting may be administrative rather than economic in nature. These kinds of relationships may exist when old and new UI reporting units share some physical assets but do not represent the exact same worksites. An example is a firm that changes from reporting all of its jobs in one report to reporting separately about individual worksites. The establishments newly reported on do not represent actual business births, so it would be reasonable to impute some of the predecessor’s employment data onto them prior to the date of the administrative change. For time-series construction, each establishment involved in a predecessor–successor transaction is given an adjustment value based on its most recent relationship. For example, if a worksite represented 10 percent of its firm’s employment when reporting was broken out in detail, then, prior to that point in time, 10 percent of its firm’s reported employment would have been imputed to that worksite.

Industry, area, and ownership code changes, which may be for either economic or noneconomic reasons, also occur in the LDB. Economic code changes represent a change in business activity that was denoted in the quarter it occurred. These changes are included in the time series. Unlike economic code changes, noneconomic code changes (NECCs) are administrative in nature and therefore are adjusted before their inclusion in a time series, in order to eliminate series breaks. With the aim of reducing the number of noneconomic breaks, the LDB is adjusted so that each establishment was given its final (i.e., most recently assigned) codes.

The sum of LDB employment—adjusted for predecessor–successor transactions—is then tallied for each industry, county, township, and ownership level. To these totals, employment data for LDB records with unclassified county or town codes is distributed on the basis of the proportion of employment in each county and town, for every NAICS and ownership code. Employment data associated with unassigned NAICS codes are distributed proportionally to other industries within a county or town. Records that lack NAICS and county or town codes are distributed to counties and towns on the basis of their proportion of total CES-assigned employment within the state and then distributed proportionally to all industries.

##### Employment not covered by the LDB

The CES program works with states each year to review UI laws and determine an appropriate noncovered employment (NCE) value for each industry and area, as described in the section on benchmarking. In developing new series constructions, NCE values are derived for the most recent year, and ratios of noncovered-to-covered employment are calculated and used to estimate historical NCE level.

Regular faculty members with contracts of at least 1 year at primary and secondary schools, colleges, and universities are counted as employed for the entire year in the CES survey, whether or not they receive pay year round. Many school faculty members do not get paid during summer breaks and are not counted under QCEW employment definitions, creating an additional difference in scope that requires adjustment.

Noncovered employment totals and summer faculty adjustments are added to the sum of LDB employment when new AE series are constructed.

##### Non-all employee data

When new series are added for non-AE data types, the available matched sample is used to construct histories. An initial sample average is calculated to provide a starting point for the series and then the sample data are processed through the robust weighted link and taper.

Time series published by CES-SA can exhibit regularly recurring seasonal movements. Seasonal adjustment eliminates the part of the series attributable to normal seasonal variation and makes it possible to observe the cyclical and other non-seasonal movements in CES-SA series. Seasonally adjusted series are published monthly for selected nonfarm payroll employment estimates.

### X-13ARIMA-SEATS and special model adjustments

CES-SA uses the X-13ARIMA-SEATS software, developed and maintained by the U.S. Census Bureau, to seasonally adjust CES-SA. Models are set annually, and seasonal adjustment is performed on a concurrent basis. The X-13ARIMA-SEATS software is available on the U.S. Census Bureau website at https://www.census.gov/data/software/x13as.html.

CES-SA defaults to using 10 years of data as an input to X-13ARIMA-SEATS. If a historical correction prior to the default ten years was made during the annual benchmarking process, the start year of the input universe time series will be the earliest corrected year. For newly published series, a minimum of 3 years of data is required before the series can be processed by X-13ARIMA-SEATS.

Because of the nature of the CES program, BLS controls for survey interval variations, sometimes referred to as the 4- versus 5-week effect, during seasonal adjustment. Although the CES survey is referenced to a consistent concept—the pay period including the 12th of each month—inconsistencies arise because there are sometimes 4 and sometimes 5 weeks between the weeks including the 12th in a given pair of months. In highly seasonal industries, these variations can be an important determinant of the magnitude of seasonal hires or layoffs that have occurred at the time the survey is taken, thereby complicating seasonal adjustment.

Standard seasonal adjustment methodology relies heavily on the experience of the most recent three years to determine the expected seasonal change in employment for each month of the current year. Prior to the implementation of the adjustment, the procedure did not distinguish between 4- and 5-week survey intervals, and the accuracy of the seasonal expectation depended in large measure on how well the current year's survey interval corresponded with those of the previous 3 years. All else equal, the greatest potential for distortion occurred when the current month being estimated had a 5-week interval but the 3 years preceding it were all 4-week intervals; or, conversely, when the current month had a 4-week interval but the three years preceding it were all 5-week intervals.

BLS uses REGARIMA (regression with autocorrelated errors) modeling to identify the estimated size and significance of the calendar effect for each published series. REGARIMA combines standard regression analysis, which measures the numerical relationship among two or more variables, with ARIMA modeling, which describes and predicts the behavior of data series based on its own history. For many economic time series, including nonfarm payroll employment, observations are auto-correlated over time; each month's value is significantly dependent on the observations that precede it. These series, therefore, usually can be successfully fit using ARIMA models. If autocorrelated time series are modeled through regression analysis alone, the measured relationships among other variables of interest may be distorted because of the influence of the autocorrelation. Thus, the REGARIMA technique is appropriate for measuring relationships among variables of interest in series that exhibit autocorrelation, such as nonfarm payroll employment.

In this application, the relationships of interest are those between employment levels in individual calendar months and the lengths of the survey intervals for those months. The REGARIMA models evaluate the variation in employment levels attributable to eleven separate survey interval variables, one specified for each month, except March. March is excluded because there are almost always 4 weeks between the February and March surveys. Models for individual basic series are fit with the most recent ten years of data available, the standard time span used for CES seasonal adjustment.

The REGARIMA procedure yields regression coefficients for each of the 11 months specified in the model. These coefficients provide estimates of the strength of the relationship between employment levels and the number of weeks between surveys for the 11 modeled months. The X-13ARIMA-SEATS software also produces diagnostic statistics that permit the assessment of the statistical significance of the regression coefficients, and all series are reviewed for model adequacy.

Because the 11 coefficients derived from the REGARIMA models provide an estimate of the magnitude of variation in employment levels associated with the length of the survey interval, these coefficients are used to adjust the CES data to remove the calendar effect.

### Two-step method

CES-SA uses a two-step seasonal adjustment method to address the hybrid nature of the time series, which consist of universe data through the latest benchmark month followed by sample data up to the current month. The seasonal variations of these two data sources can differ and cause distortions in the seasonally adjusted data unless adjusted separately.

Beginning with January 1994 data, BLS implemented an alternative method that separately adjusts each part of the time series, an approach first carried out by Berger and Phillips.⁠5 This method uses the seasonal trends found in universe-derived employment counts to adjust historical benchmark employment data while also incorporating sample-based seasonal trends to adjust sample-based employment estimates. These two series are independently adjusted then spliced together at the October re-estimate point. By accounting for the differing seasonal patterns found in historical benchmark employment data and the sample-based employment estimates, this technique yields improved seasonally adjusted series with respect to analysis of month-to-month employment change.

With the release of January 2018 data, CES-SA converted to concurrent seasonal adjustment, which uses all available estimates, including those for the current month, in developing sample-based seasonal factors. Previously, CES-SA projected sample-based seasonal factors once a year during the annual benchmark process and applied those projected seasonal factors to the not seasonally adjusted monthly estimates over the following year. More information on CES-SA’s use of concurrent seasonal adjustment is available at https://www.bls.gov/sae/seasonal-adjustment/implementation-of-concurrent-seasonal-adjustment-for-ces-state-and-area-estimates.htm.

CES-SA publishes seasonally adjusted data for nonfarm payroll employment series at the supersector and sector levels, as well as total nonfarm (TNF) for published metropolitan areas that have sufficient sample history and satisfy criteria for seasonal adjustment. Revisions of historical data for the most recent 5 years, where available, are made once a year, coincident with annual benchmark adjustments. If a historical correction is issued to the unadjusted supersector or sector series prior to the standard 5-year replacement period, seasonally adjusted data will be revised back to the earliest historically corrected year.

### Derivative data

Derivative series, plus annual and quarterly averages, are data derived from sample-based estimates. All hours and earnings derivative data are calculated for both AE and PE using their respective sample-based employment, hours, and earnings estimates. The following equations describe how these data are calculated.

###### Equation 18. Annual averages

$AE 13 = ∑ 1 12 AE t 12$

$PE 13 = ∑ 1 12 PE t 12$

$AWH 13 = ∑ 1 12 WH t ∑ 1 12 EMP t$

$AHE 13 = ∑ 1 12 PR t ∑ 1 12 WH t$

where,

13 = annual average. (Note: When accessing these data via the public access tool then annual averages are shown as M13.)

###### Equation 19. Average weekly earnings

$AWE i = AWH i × AHE i$

where,

$i = current month .$$= current month .$

###### Equation 20. All employees, 3-month average change, seasonally adjusted

$AE 3mon = AE i - AE i - 3 3$

where,

#### Diffusion indexes

Diffusion indexes are produced at the total nonfarm level as an aggregate of the 50 states and the District of Columbia, as well as an aggregate of 389 Metropolitan Statistical Areas (MSAs), over 1-, 3-, 6-, and 12-month time spans, and are used to measure the breadth of employment growth or decline. Seasonally adjusted data are used for all CES-SA diffusion indexes.⁠6

To derive the indexes, calculate the employment change for the 1-, 3-, 6-, or 12-month span for each component state and MSA. They are assigned a value of 0 to each employment decrease, 50 to each unchanged observation, and 100 to each employment increase over the selected time span. The diffusion index is the average (mean) of the assigned values. Diffusion indexes above 50 indicate that employment in most states and areas increased over a given timespan, while values under 50 indicate that employment declined.

##### Reliability

The CES survey, like other sample surveys, is subject to two types of error, sampling and nonsampling error. The magnitude of sampling error, or variance, is directly related to the size of the sample and the percentage of universe coverage achieved by the sample. The CES sample of about 131,000 businesses and government agencies covers over one-third of total universe employment on average, typically yielding a small variance for the statewide total nonfarm estimates.

#### Benchmark revision as a measure of survey error

The sum of sampling and nonsampling error represents total survey error. Unlike most sample surveys that publish sampling error as their only measure of error, the CES derives an annual approximation of total error using lagged and independently derived universe data. While the benchmark revision is often used as a proxy measure of total error for the CES survey estimate, the QCEW is also subject to some forms of measurement error (though not sampling error), and therefore the revision reflects the net of the errors present in each program.⁠7 Benchmark revisions are published in the annual benchmark article for each state at the total nonfarm level, and summary statistics on the distribution of revisions are published for statewide supersectors and MSAs.

#### Revisions between preliminary and final data

The CES-SA program publishes first preliminary estimates of employment, hours, and earnings based on less than the total sample—typically on the third Friday immediately following the reference month. Revised sample-based estimates are published during with the subsequent month’s preliminary estimates to allow for receipt of additional sample. Revisions are published on the BLS website each month.

#### Variance estimation

The estimation of sample variance for AE, available on the BLS website, is accomplished through use of generalized variance functions (GVFs). The generalized variance function connects the variance of an estimator to a set of variables the variance depends on, such as the size of employment, the number of respondents, and the fraction of the population represented by the sample. Variances estimated directly from the sample may be unstable—especially in smaller domains. A model-based GVF is a way to stabilize direct estimates of variances.

The parameters of the model used for GVF are estimated by fitting the model to a set of direct replication-based variances obtained from the repeatedly grouped balanced half-samples (RGBHS) procedure. RGBHS is a modification of the balanced half-samples (BHS) technique used by the CES-N program.

The estimation of sample variance for CES-N estimates is accomplished through the Fay’s method of BHS. In the Fay’s BHS method, the sample is repeatedly divided into halves using a systematic technic. For each replicate division, the original sample weights are adjusted in both halves of the sample: weights for units that belong to one half of the sample are multiplied by a factor of $1 + γ$, whereas weights for units in the other half of the sample are multiplied by a factor of $V BHS θ ^ = 1 γ 2 k ∑ α = 1 k θ ^ α - θ ^ 2$. Replicate estimates based on these adjusted weights are calculated using the same estimation formula as used for the full-sample estimate. The procedure is repeated k times. The BHS variance is calculated by measuring the variability of the replicate estimates as described in equation 21.

###### Equation 21. Current Employment Statistics variance

$θ ^ α is the α - th replicate estimate ;$

where,

$θ ^ is the full sample estimate ;$

$γ = 1 2 ; an d$

$k = is the number of replicates .$

$V RGBHS θ ^ = 1 R ∑ r = 1 R V BHS r θ ^$

RGBHS is a modification of BHS which involves the repeated calculation of BHS estimates R times. After each calculation, the sample units are randomly re-assigned to two groups in order to produce new BHS estimates. Finally, the RGBHS variance estimate is obtained by averaging the BHS variance estimates as

$log V ^ d = β 0 + β 1 log Y d,0 + β 2 log Y d,s n d + β 3 log Y d,0 Y d,s - 1 + ε d$

where,

R stands for r⁠th reassignment of the random groups.

The GVF regressions is then used to smooth the RGBHS, as shown in equation 22.

###### Equation 22. Generalized variance function

$V ^ d = RGBHS variance estimate for domain d;$

where,

$Y d,0 = the benchmark employment level;$

$Y d,s = the average unweighted sampled employment;$

$n d = the average number of respondents; an d$

$ε d = the regression residual .$

The GVF modeled variance estimates are then used for publication. More information on the generalized variance function can be found at https://www.bls.gov/osmr/research-papers/2013/pdf/st130050.pdf.

##### Appropriate uses of sampling variances

Variance statistics are useful for comparison purposes, but they do have some limitations. Variances reflect the error components of estimates that are due to surveying only a subset of the population, rather than conducting a complete count of the entire population; however, they do not reflect nonsampling error, such as response errors, bias due to nonresponse, and model error for modeled estimates. The variances of the over-the-month change estimates are very useful in determining when changes are significant at some level of confidence.

Notes

⁠1 P.N. Kokic and P.A. Bell, “Optimal winsorizing cutoffs for a stratified finite population estimator,” Journal of Official Statistics, no. 10, (1994), pp. 419–35.

⁠2 Fay, R. E., and Herriot, R. A. (1979), “Estimates of Income for Small Places: An Application of James - Stein Procedures to Census Data,” Journal of the American Statistical Association, no. 74, 269–77.

⁠3 Julie Gershunskaya and Terrance D. Savitsky, “Bayesian nonparametric joint model for point estimates and variances,” in Office of Survey Methods Research, 2019 (U.S. Bureau of Labor Statistics, November 2019), https://www.bls.gov/osmr/research-papers/2019/st190020.htm.

⁠4 Paige Schroeder, “How to catch an outlier: A robust method for hours and earnings estimation in the Current Employment Statistics Survey,”. in Office of Survey Methods Research, 2019 (U.S. Bureau of Labor Statistics, September 2019), https://www.bls.gov/osmr/research-papers/2019/st190070.htm.

⁠5 Franklin D. Berger and Keith R. Phillips, “Solving the mystery of the disappearing January blip in state employment data,” Federal Reserve Bank of Dallas, 1994,https://www.dallasfed.org/~/media/documents/research/er/1994/er9402d.pdf..

⁠6 TJ Lepoutre, “Diffusion indexes of state and metropolitan area employment changes,” Monthly Labor Review (March 2022), https://www.bls.gov/opub/mlr/2022/article/diffusion-indexes-of-state-and-metropolitan-area-employment-changes.htm.

⁠7 Jeffrey Groen, “Sources of error in survey and administrative data: the importance of reporting procedures,” Journal of Official Statistics, no. 28, (2012), pp. 173–98.