U.S. Department of LaborBureau of Labor Statistics Geographic ProfileBLS HomeWhat's NewBLS ContactsSearch BLS
Accessibility Information Geographic Profile of Employment and Unemployment, 2014 Bulletin 2873
Geographic Profile, 2014

Appendix B: Sampling and estimation procedures and sampling error tables

Metro Areas/Cities

Appendix B Tables: (PDF)

The estimates presented in this bulletin are based on annual averages of monthly data obtained from the Current Population Survey (CPS), a sample survey of the civilian noninstitutional population. The survey, conducted each month by the U.S. Census Bureau for the U.S. Bureau of Labor Statistics, provides comprehensive data on the labor force, including such characteristics as age, gender, race, Hispanic or Latino ethnicity, marital status, occupation, and industry. The survey also provides data on the characteristics of those not in the labor force.

Each month, trained interviewers collect information from a scientifically selected sample of about 60,000 eligible households. This sample, designed to represent the civilian noninstitutional population, also includes about 10,000 households in order to meet the requirements of the State Children’s Health Insurance Program (SCHIP) legislation. The SCHIP legislation required the Census Bureau to improve state estimates of the number of children who live in low-income families and lack health insurance. These estimates are obtained from the Annual Demographic Supplement to the CPS. In September 2000, the Census Bureau began expanding the monthly CPS sample in 31 states and the District of Columbia because of the SCHIP legislation.

Selected respondents in the eligible households are interviewed to obtain information about the employment status of each household member 16 years of age and older. The information that is collected pertains to a “reference week,” usually the calendar week (Sunday to Saturday) that includes the 12th of the month, with actual interviewing occurring during the week following the reference week—known as the “survey week.”

Sampling procedures

The 2014 sample encompasses 824 sample areas, with coverage of every state and the District of Columbia. It is based, to a large extent, on information about the distribution of the population as reported in the Census 2000 enumeration. (A redesigned Census 2000-based sample was phased in from April 2004 through July 2005.) The 824 areas were selected by dividing the entire area of the United States into 2,025 primary sampling units (PSUs). With some minor exceptions, a PSU consists of a county or a number of contiguous counties. Most metropolitan areas constitute separate PSUs.

To improve the efficiency of the sample, the 2,025 PSUs are grouped into strata within each state. Those PSUs which are in a stratum by themselves are called self-representing and are generally the most populous in each state. Other strata are formed by combining PSUs that are similar in such characteristics as population growth, proportion of Blacks and Hispanics, and distributions by occupation and industry and by age and gender. PSUs selected from these strata are non-self-representing, because each one chosen represents the entire stratum. One PSU is selected from each stratum, with the probability of selection proportional to the relative population size of the PSU.

In states with a SCHIP sample, the self-representing PSUs are the same for both the regular CPS and SCHIP. In most states, the same non-self-representing sample PSUs are in the sample for both the regular CPS and SCHIP; however, to improve the reliability of the SCHIP estimates in Maine, Maryland, and Nevada, the SCHIP non-self-representing PSUs are selected independently of the regular CPS sample PSUs, with replacement. The method for stratification of PSUs for SCHIP in these states is similar to that of the other stratifications, except that the stratification variable used is the number of people under age 18 with household income below twice the poverty level.

Within each of the selected PSUs, the number of households to be enumerated each month is determined in two steps. First, a sample of the unit’s census enumeration districts (EDs) is selected through the use of the population size probability selection procedure. EDs are administrative units and contain, on average, about 300 households. Second, clusters of approximately four addresses (contiguous wherever possible) are selected to be enumerated within each designated ED.

Part of the sample is changed, or rotated, each month. A given rotation group is in the sample for 4 consecutive months, leaves the sample during the next 8 months, and then returns for another 4 consecutive months. A primary reason for rotating the sample is to minimize the lack of cooperation that may result from interviewing a constant panel indefinitely. The rotation plan provides for three-fourths of the sample to be identical from one month to the next and one-half to be identical with that from the same month a year earlier.

Methods of estimation

Under the methods of estimation used in the CPS, all of the results for a given month become available simultaneously and are based on returns from the entire sample of respondents. The estimation procedure involves weighting the data from each respondent by the inverse of the probability of the person being in the sample. The result gives a rough measure of the number of actual people that each sample person represents. Through a series of estimation steps (outlined next), the selection probabilities are adjusted for noninterviews and survey undercoverage; data from previous months are incorporated into the estimates through the composite estimation procedure.

1. Noninterview adjustment. The weights for all interviewed households are adjusted to the extent needed to account for occupied sample households for which no information was obtained because of absence, impassable roads, refusals, or unavailability of the respondents for other reasons. This noninterview adjustment is made separately for clusters of similar sample areas that are usually, but not necessarily, contained within a state. Similarity of sample areas is based on metropolitan area status and size. Within each cluster, there is a further breakdown by residence. The proportion of sample households not interviewed averages about 7 percent to 8 percent, depending upon a number of factors, including weather and vacations.

2. Ratio estimates. The distribution of the population selected for the sample may differ somewhat, by chance, from that of the population as a whole in such characteristics as age, race, gender, and state of residence. Because these characteristics are closely correlated with labor force participation and other principal measurements made from the sample, the survey estimates can be substantially improved when weighted appropriately by the known distribution of the population characteristics. This task is accomplished through four stages of adjustment, as follows:

    a. First-stage ratio adjustment. The purpose of the first-stage ratio adjustment is to reduce the contribution to the variance of the sample state-level estimates arising from the sampling of PSUs. (There would still be variance associated with the state-level estimates even if the survey included all households in every sample PSU.) This kind of variance is called between-PSU variance. For some states, the between-PSU variance makes up a relatively large proportion of the total variance, whereas the relative contribution of the between-PSU variance at the national level is generally quite small. There are several factors to consider in determining what information to use in applying the first-stage adjustment: the information must be available for each PSU, be correlated with as many of the relevant statistics from the CPS as possible, and be reasonably stable over time so that the gain achieved from the ratio adjustment procedure does not deteriorate. The basic labor force categories (unemployed, nonagricultural employed, etc.) could be used; however, this information probably would fail the stability criterion. The distribution of the population by race (Black alone and non-Black alone) by age groups 0–15 years old and 16 years and older satisfies all three criteria, including stability.

    The use of the categories of Black alone and non-Black alone compensates for the fact that the racial composition of a non-self-representing (NSR) sample PSU could differ substantially from the racial composition of the stratum it is representing. This adjustment is not necessary for self-representing (SR) PSUs, because they represent only themselves. Adjustment factors are computed for the two race categories for each state containing NSR PSUs. The Black-alone and non-Black-alone cells are collapsed within a state when a cell meets one of four sampling criteria.1 As a result of these criteria, the first-stage ratio adjustment actually is used (i.e., does not collapse to 1.0) in less than half of the states.

    b. National coverage adjustment. A national coverage adjustment was added to the CPS weighting process beginning in 2003. The purpose of the national coverage adjustment is to correct for interactions between race and ethnicity that are not addressed in the second-stage weighting. (See item “d” following.) Research has shown that the undercoverage of certain race–ethnicity combinations (e.g., non-Black Hispanic) cannot be corrected with second-stage adjustment alone. The national coverage adjustment also helps to speed the convergence of the second-stage adjustment, resulting in fewer iterations required to reach the final national controls. The national coverage adjustment factors are based on independently derived estimates of the population. Person records are grouped into four pairs on the basis of their month-in-sample (MIS). MISs 1 and 5, 2 and 6, 3 and 7, and 4 and 8 form the four pairs. Each MIS pair is then adjusted to age–gender–race–ethnicity population controls. Between 2 and 28 age cells are used, depending on which of the six major coverage groups (Black alone non-Hispanic, White alone non-Hispanic, White alone Hispanic, non-White alone Hispanic, Asian alone non-Hispanic, or residual race non-Hispanic) is being adjusted.

    c. State coverage adjustment. Besides a national coverage adjustment, a state coverage adjustment was added to the CPS weighting process beginning in 2003. The purpose of the state coverage adjustment is to adjust for state differences in age–gender–race coverage. Research has shown that estimates of characteristics of certain race groups (e.g., Blacks) can differ greatly from the controls if a state coverage adjustment is not used. However, unlike the national coverage adjustment, the state coverage adjustment slows the convergence of the second-stage ratio adjustment process. The state coverage adjustment is based on independently derived estimates of the population. Except for the District of Columbia, person records for non-Black alone are grouped into four pairs based on MIS—with the same MIS pairings (1 and 5, 2 and 6, 3 and 7, and 4 and 8) used as in the national coverage adjustment. Person records for Black alone for all states and non-Black alone for the District of Columbia are formed at the state level, with all months in the sample combined. For the Black-alone component of the adjustment, states are adjusted with the use of a varying number of age–gender–race cells based on the expected number of sample records in each age–gender cell. For example, for non-Black alone, all states but not the District of Columbia are adjusted for three age groupings (0–15, 16–44, and 45 and older) by gender. Each cell is adjusted to independent age–gender–race population controls in each state.

    d. Second-stage ratio adjustment. The second-stage ratio adjustment is performed to decrease the variance of the vast majority of the CPS sample estimates. Because the labor force status of individuals in the general population is correlated with their specific geographic and demographic identification (e.g., teenagers and unemployment, or rural married women and labor force participation), the variance of the labor force estimates can be reduced by controlling the CPS sample estimates to independent estimates of selected geographic and demographic population categories. The procedure also is believed to reduce bias due to coverage errors. The procedure adjusts the weights for the sample to estimates within each MIS pair to control the sample estimates for a number of geographic and demographic subgroups of the population in order to ensure that these sample-based estimates of the population match independent population controls for each of the categories. These independent population controls are updated each month. Three sets of controls are used: (1) the civilian noninstitutional population for the 50 states and the District of Columbia by gender and age (0–15, 16–44, and 45 and older); (2) the national civilian noninstitutional population for 36 Hispanic and 36 non-Hispanic age–gender categories; and (3) the total national civilian noninstitutional population for 56 White, 36 Black, and 26 residual age–gender–race categories.

    The adjustment is done separately for each MIS pair (1 and 5, 2 and 6, 3 and 7, and 4 and 8). Because adjusting the weights to match one set of controls can cause differences in other controls, an iterative process is used to simultaneously control all variables. Successive iterations begin with the weights as adjusted by all previous iterations. Ten iterations are performed, resulting in (virtual) consistency between the sample estimates and the population controls.

    The independent population controls used for the CPS are produced by the Census Bureau’s Population Division. The CPS population controls are based on a demographic framework of population accounting. Under this framework, time series of population estimates and projections are anchored by the latest decennial census enumerations, with populations for dates since the latest decennial census derived from the estimation, or projection, of population change. In the simplest terms, information from a variety of data sources is used to derive estimates of population change by adjusting the resident population as enumerated in the latest decennial census for births, deaths, and net migration. Estimates of the resident population are adjusted to represent the civilian noninstitutional population 16 years of age and older (the eligible CPS population) by subtracting estimates of the number of residents under 16 years of age, the number of residents in the Armed Forces, and the number of residents who are institutionalized.

3. Composite estimation procedure. The last step in the preparation of most CPS estimates makes use of a composite estimation procedure. The composite estimate consists of a weighted average of two factors: (1) the second-stage ratio estimate based on the entire sample from the current month and (2) the composite estimate for the previous month, plus an estimate of the month-to-month change based on the six rotation groups common to both months. In addition, a bias adjustment term is added to the weighted average to account for relative bias associated with MIS estimates. The compositing procedure results in a further reduction in sampling error—that is, a reduction beyond that which is achieved after the two stages of ratio adjustment.

Effective with the release of January 1998 data, a new composite estimation method was implemented for the CPS. The new technique provides increased operational simplicity for microdata users and allows optimization of compositing coefficients for different labor force categories. Under the new procedure, weights are derived for each record. These weights, when aggregated, produce estimates consistent with those produced by the composite estimator. Under the previous procedure, composite estimation was performed at the macrolevel. The composite estimator for each tabulated cell was a function of the aggregated weights for respondents contributing to the cell in question in current and previous months. The different months of data were combined by use of compositing coefficients. Thus, microdata users needed several months of data to compute composite estimates. To ensure consistency, the same coefficients had to be used for all estimates. The values of the coefficients selected were much closer to optimal for unemployment values than for employment or labor force values.

The new composite weighting method involves two steps: (1) the computation of composite estimates for the main labor force categories, classified by important demographic characteristics, and (2) the adjustment of the microdata weights, through a series of ratio adjustments, to agree with these composite estimates, thus incorporating the effect of composite estimation into the microdata weights. Under this procedure, the sum of the composite weights of all sample people in a particular labor force category equals the composite estimate of the level for that category. Thus, to produce a composite estimate for a particular month, a data user needs simply to access the microdata file for that (single) month and compute a weighted sum. The new composite weighting approach also improves the accuracy of labor force estimates by using different compositing coefficients for different labor force categories. The weighting adjustment method ensures additivity while allowing variation in compositing coefficients.

Reliability of the estimates

The estimates in this bulletin are based upon a sample of the population rather than a complete count. Therefore, they may differ from the figures that would have been obtained if it had been possible to take a complete census using the same questionnaire and procedures that are used in the CPS. There are two types of errors in an estimate based on a sample survey: sampling error and nonsampling error. Tables B-2 through B-5 indicate the magnitude of the sampling error. They also partially measure the effect of some nonsampling errors in response and enumeration but do not measure any systematic biases in the data.

Sampling variability. The standard error is primarily a measure of sampling variability—that is, the variation that occurs by chance because a sample rather than the entire population is surveyed. The sample estimate and its standard error enable one to construct confidence intervals: ranges that would include the average result of all possible samples with a known probability. For example, if all possible samples were selected, each of these samples were surveyed under essentially the same conditions by use of the same sample design, and an estimate and its estimated standard error were calculated from each sample, then the following would occur:

  1. Approximately 68 percent of the intervals from 1 standard error below the estimate to 1 standard error above the estimate would include the average result of all possible samples.
  2. Approximately 90 percent of the intervals from 1.645 standard errors below the estimate to 1.645 standard errors above the estimate would include the average result of all possible samples.
  3. Approximately 95 percent of the intervals from 2 standard errors below the estimate to 2 standard errors above the estimate would include the average result of all possible samples.

The error of a sample estimate varies inversely with the size of the sample and directly with the size of the estimate. Hence, an estimate for a subgroup constituting a small proportion of a population will tend to have a larger error relative to its size than will an estimate for a larger subgroup.

Reliability standards

The CPS sample design takes into consideration both national and state reliability. For the state data, a minimum reliability standard is set: an expected maximum coefficient of variation (CV) on the level of total unemployment of 8 percent annually. This CV is calculated with the assumption of a 6-percent unemployment rate. Because each states' sample design must meet the reliability standard, the CPS sampling rate differs by state. (The sampling rate is the proportion of all households that are selected for the sample.) Generally, the smaller the state population, the higher is the sampling rate. Sampling rates range roughly from 1 in every 200 households to 1 in every 2,500 households in each stratum within the state.

Publication standards for state and area CPS data

To achieve comparability of the data for regions, divisions, states, metropolitan areas, metropolitan divisions, and cities for publication purposes, a unique requirement for minimum levels for the labor force and for employment and unemployment was developed for each area. This requirement is based on the known differences in sampling rates among these areas. Before estimates are published for a specific category (such as Hispanic unemployment in a particular state), a predetermined “critical cell” must meet a 50-percent CV requirement. As a result of this requirement, minimum bases for publication have been developed for each area. Table B-1 lists the minimum necessary base for publication of data in each of the census regions and divisions; in the states and the District of Columbia; and in the metropolitan areas, metropolitan divisions, and cities appearing in this bulletin.

Estimates are not shown when they do not meet the minimum base for the state or area listed in table B-1. In tables showing the labor force status of the population—that is, the number of employed and unemployed—publishability is determined by whether the labor force level exceeds the minimum base for unemployment in table B-1. If the labor force level is less than the unemployment minimum base, all data—labor force, employment, unemployment, and unemployment rate—are suppressed. In all other tables, the determining factor is whether the size of the base of the distribution exceeds the minimum base for employment or unemployment separately, depending on whether the table presents a distribution of employment or unemployment for the area or population subgroup. For example, in the table showing unemployed people by reason for unemployment, the entire line of data will be suppressed if the total unemployment is less than the minimum base for unemployment. If a subgroup appears in the table (such as a given gender or race), data for the subgroup also will be suppressed if the total for the reason in question does not meet the minimum base. Data are not published for any cell with a level of less than 500 people or less than 0.05 percent of the total for a given characteristic.

Using the sampling error tables

Tables B-2 through B-5 provide sampling errors for use in constructing 90-percent confidence intervals (approximately 1.645 standard errors) for major labor force characteristics. The sampling errors provided are approximations and thus indicate the order of magnitude of the sampling error rather than the precise amount of the possible error in an estimate. Illustrations on the use of these tables are provided next. In all cases, the computations present the estimated levels in thousands of people.

Sampling error of an estimated number. Table B-5 shows that an estimate of 50,000 unemployed people in Michigan will have an absolute sampling error of 10,000, for a relative sampling error of 20 percent (10,000/50,000). In comparison, an estimate of 100,000 unemployed people in Michigan has an absolute sampling error of 14,000, yielding a relative sampling error of 14 percent (14,000/100,000). A statement that unemployment for a particular group is between 40,000 and 60,000 in the first instance, and between 86,000 and 114,000 in the second, can be made with approximately 90-percent confidence.

The latter statement can be interpreted as follows: if one were to draw all possible samples, make an estimate from each sample (using the same methods and techniques), and construct an interval around each estimate (with the sampling errors shown in the tables), then 90 percent of the intervals would contain the average value of all possible samples.

To convert a sampling error from 90-percent confidence, as displayed in the tables, to 68-percent confidence (1 standard error), multiply the sampling error shown in the tables by 0.63. To convert the sampling error from 90-percent to 95-percent confidence (approximately 2 standard errors), multiply the sampling error by 1.23. For the example given, the sampling error at 90-percent confidence is 10,000. At 68-percent confidence, the error would be about 6,300 (10,000 × 0.63). At 95-percent confidence, the error would be about 12,300 (10,000 × 1.23).

Sampling error of a difference. To compute the error of a difference from the tables, an additional step is required. If, for instance, one wishes to know whether a change in the unemployment rate from one year to the next in a particular area for a particular population group is statistically significant or whether the difference in the unemployment rate between two areas or population groups is statistically meaningful, the significance of the difference needs to be computed. (Differences between estimates for 2 consecutive years may be influenced to some extent by a redesign of the CPS concepts, questionnaire, and collection procedures, such as the one that occurred in 1994.)

As noted before, differences can take two general forms: (1) differences between population groups and/or geographic areas, and (2) differences for the same population group and geographic area over time. Either type of difference can be calculated with the following formula, noting the limiting covariance assumption discussed later:

    SEd = [( SE12 + SE22 ) – 2C × ( SE1 × SE2 )]1/2.
    In this equation:

      SEd = the sampling error of the difference,
      SE1 = the sampling error of one group or year,
      SE2 = the sampling error of another group or year, and
       C    = the covariance (or relationship) term.

The SE1 and SE2 can be found in the appropriate table of Geographic Profile for each year if the comparison is between different years, because the size of the samples and, consequently, sampling errors may differ from year to year. Values for the covariance, or “C” term, for employment and unemployment for differences between consecutive years are as follows: for labor force or employment levels, C = 0.58; for unemployment levels or rates, C = 0.37. It is important to note that these C terms are usable only for calculating the sampling error of a difference for over-the-year change for the same geographic area and population group.

Covariance terms for the relationship between different population groups or geographic areas in this bulletin are not available. In calculating sampling errors for differences between two different population groups or geographic areas, a C term of zero must be assumed. The effect of this assumption is that (1) if the relationship between two groups, areas, or years (differences for nonconsecutive years) is small, then the C term can legitimately be ignored and the sampling errors will not be adversely affected, and (2) if there is a strong positive relationship between the two groups, areas, or years (differences for consecutive years), then the error computed without a C term will be overstated. An overstatement could lead one to state that a difference or change was not statistically significant when, in fact, it was. When there is a strong relationship over time for a characteristic such as employment (people tend to remain employed from one year to the next), the importance of using a C term to calculate the sampling error of a difference over time increases greatly.

The next example illustrates how to calculate the sampling error of a difference. Suppose one wished to know whether a hypothetical difference between an unemployment level of 250,000 for a particular population group in California and an unemployment level of 200,000 for the same group in New York was statistically significant at 90-percent confidence. Table B-5 gives the error for an unemployment level of 250,000 in California as approximately 22,000 and the error for an unemployment level of 200,000 in New York as 19,000. Using the formula described previously without the C term produces the following results (levels in thousands):

    SE1 = 22; SE2 = 19;
    SE12 + SE22 = 845;
    SEd = ( SE12 + SE22 )1/2 = 29.

Because each state's sample is independent, there is no measurable correlation between the two estimates, and a C term of zero can be assumed. Thus, the error of the difference is approximately 29,000. Because the actual difference (50,000) is greater than the error of the difference, it can be stated with 90-percent confidence that the difference in the unemployment level is attributable to factors other than sampling variability alone.

Sampling errors for unemployment rates. Unemployment rates and error ranges for these rates are provided in tables 1, 14, and 27. This information can be used to derive a sampling error for an unemployment rate if one is needed. The error range is a 90-percent confidence interval around the unemployment rate. By subtracting the estimated unemployment rate from the upper bound of the range (or subtracting the lower bound of the range from the estimated unemployment rate), the sampling error for the rate can be obtained. This sampling error can then be used in the formula given previously for computing the sampling error of a difference, or for any other purpose the user chooses.

Interpolation and extrapolation. Although sampling errors are listed for selected levels of employment and unemployment in tables B-2 through B-5, users may wish to know the sampling error for an estimate whose value is not listed. To derive such a sampling error, it is necessary to use interpolation or extrapolation.

For example, in order to derive the sampling error for the 2014 total unemployment level for women in Ohio, it is necessary to use interpolation because table B-5 contains no sampling error for an unemployment-level estimate of 143,000. The following formula and accompanying example show how to interpolate for this estimate:

    SE = {[(A-G) / (F-G)] × (X-Y)} + Y.
    In this equation,
      SE = the sampling error for the estimated value,
      A = the estimated value (143,000),
      F = the table value (200,000) immediately above the estimated value,
      G = the table value (100,000) immediately below the estimated value,
      X = the sampling error of F (19,000), and
      Y = the sampling error of G (14,000).

    Thus (levels in thousands),
    SE = {[(143 - 100) / (200 - 100)] × (19 - 14)} + 14
    SE = ( 0.43 × 5 ) + 14
    SE = 2.15 + 14
    SE = 16

If the sample-based estimate lies outside the boundaries of the error tables, extrapolation can be used to approximate the sampling error. The formula for extrapolation is the same as that for interpolation; however, the F term becomes the highest value in the table and the G term becomes the next-highest value.

Derivation of sampling errors

The state and area sampling errors are developed with a generalized regression procedure and are not based on sample data for each individual area, population group, or labor force characteristic. As with all sampling error tables produced for CPS state and area data, a number of approximations are required in order to derive sampling errors that apply to a wide variety of items. As a result, these sampling errors indicate the order of magnitude of the error rather than a precise error for any specific item. The sampling error tables are derived from standard error equations and special parameters developed by the Bureau of Labor Statistics. These parameters are available upon request from the Division of Local Area Unemployment Statistics, Bureau of Labor Statistics, Room 4675, 2 Massachusetts Avenue NE, Washington, DC 20212-0001. Telephone: (202) 691–6392.

Tables B-2 through B-5 can be used for estimates pertaining to any race or ethnic group whose data are published. As noted, the sampling errors are based on a generalized regression procedure and are approximate. Generally, the degree of precision in these tables is slightly greater for Whites (and the total of all race and ethnic groups) than it is for Blacks or Hispanics.

1 The four sampling criteria are (1) that the adjustment factor be greater than 1.3; (2) that the adjustment factor be less than 1/1.3 (or 0.769230 in decimal form); (3) that there be fewer than four NSR sample PSUs in the state; and (4) that there be fewer than 10 expected interviews in an age-race cell in the state.

 Top of Page

Other Publications:
Beyond the Numbers | Spotlight on Statistics | The Editor's Desk
Occupational Outlook Handbook | Occupational Outlook Quarterly
Monthly Labor Review

Additional information:
Local Area Unemployment Statistics Home | BLS Home Page

E-Mail: gpinfo@bls.gov
Last Updated: September 23, 2015