Bureau of Labor Statistics > Consumer Expenditure Survey > Publications

Consumer Expenditure Surveys Public-use Microdata Getting Started Guide

This page provides documentation for Consumer Expenditure Surveys (CE) Public-use Microdata (PUMD), its conventions, files, sample code, and methodology.

Section 1. CE program
Section 2. CE PUMD
Section 3. Interview Survey
Section 4. Diary Survey
Section 5. Sample code
Section 6. CE PUMD methodology

Section 1. CE program

The Consumer Expenditure Surveys (CE) program provides data on expenditures, income, and demographic characteristics of consumers in the United States. The CE program provides these data in tables, LABSTAT databases, news releases, publications, and public-use microdata files.

CE data are collected by the Census Bureau for the Bureau of Labor Statistics (BLS) in two surveys, the Interview Survey for major and/or recurring items, and the Diary Survey for more minor or frequently purchased items. CE data are primarily used to revise the relative importance of goods and services in the market basket of the Consumer Price Index. The CE program conducts the only Federal household survey to provide information on the complete range of consumers' expenditures and income. For more information, see the overview section in the CE chapter in the BLS Handbook of Methods.

Section 2. CE PUMD

CE PUMD provide the individual responses to the two surveys from respondents. The data have been adjusted to protect the confidentiality of respondents. The CE PUMD allow researchers to analyze expenditure, income, and demographic data beyond what is provided in published tabulations.

2.1 CE PUMD files

CE PUMD include data from both the Interview and Diary Surveys. Most files are analogous between the two surveys; however, the Interview Survey files contain roughly 50 additional detailed data files, as well as paradata files that provide detail about the collection process. For years prior to 1996, file availability may be limited. Table 3 Interview Survey files and content lists the major files currently available, and their content. For a more comprehensive list of files provided in the CE PUMD, see the Dictionary for the Interview and Diary Surveys.

Are the CE PUMD required for a particular research topic?
A research topic may require the detail that only PUMD files provide, but the CE program does provide a wealth of information that has already been tabulated and may be sufficient for a user's analysis. This information includes tables, LABSTAT databases, news releases, and publications. To learn more about these products, see the introduction to the CE data products.
What is required in order to use the CE PUMD?
Users of CE PUMD need to be familiar with statistical concepts and be proficient with a statistical software package, such as SAS, R, or STATA.
What to consider when using PUMD files?
The files contain individual survey responses. Thus their uses depend on the survey design. For example, the CE survey design supports reliable national averages of major expenditures. However, it may not support reliable estimates for some states. For more information, see CE Considerations When Using the Public-use Microdata.
What data formats are available?
The data files are available in SAS, STATA, SPSS, and CSV and can be downloaded from the public-use microdata data files page. If users' research requires access to CE microdata without the disclosure restrictions applied, they can apply to be visiting researchers on the BLS onsite researcher page.
Where can users obtain additional information?
If users have comments or questions about this page and its contents, contact us.

2.2 CE PUMD file conventions

For both the Interview and Diary Surveys, the files use the following conventions:

How are CE PUMD files named?
CE PUMD file naming conventions consist of three parts:
- File name
- Calendar year (YY)
- Quarter (Q) if applicable. Quarter can be 1-5.
The detailed annual Interview Survey files do not specify the quarter but only the year, for example intrvw16.zip\expn16\cla16.sas7bdat.
What types of values do CE PUMD variables use?
CE PUMD variables are stored in one of the following three formats:
- Numeric (NUM): Variables that predominantly contain dollar amounts and counts
- String (CHAR): Variables that contain a sequence of alphanumeric characters
- Categorical (CHAR): Coded variables
Where do data users find descriptions for the CE PUMD variables?
The Dictionary for the Interview and Diary Surveys contains a description for each variable. In addition to the description, the dictionary lists the associated codes, the location of a variable within the files and within the survey, and the duration in which a variable existed in the PUMD.
How do data users track changes in PUMD?
The Dictionary for the Interview and Diary Surveys tracks detailed changes to files, variables, and codes. For large survey changes, consult the history page in the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
How do data users identify a unique record in the CE PUMD?
Identifying a unique record depends on the file. For a list of the primary key variables for each file, see Table 3 Interview Survey files and content and Table 6 Diary Survey files and content.
How do data users link an interview or diary for a given Consumer Unit (CU) in different files?
NEWID links data for one CU across interviews and files. Users cannot link CUs across surveys because the Diary and Interview surveys use different samples.
How is the variable NEWID structured?
NEWID is a unique sequential number concatenated with the number of the interview. The last digit of NEWID indicates the interview number in a series of 4, or the week of diary collection in a series of 2. All values prior to the last digit, identify a CU.

Section 3. Interview Survey

3.1 Interview Survey overview

The Interview Survey is a rotating panel survey in which approximately 10,000 addresses are contacted each calendar quarter that yield approximately 6,000 useable interviews. One-fourth of the addresses that are contacted each quarter are new to the survey. After a housing unit has been in the sample for four consecutive quarters, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.

Before 2015, the Interview Survey included a preliminary bounding interview, and each CU could be contacted up to five times over five quarters. Although data from the bounding interview were not published, its purpose was to minimize telescoping errors.^ⅰ The CE program stopped fielding the bounding interview in 2015 due to concerns about its effectiveness in reducing telescoping errors, cost, and impact on respondent burden. For more information, see Ian Elkin's article Recommendation regarding the use of a CE bounding interview.^ⅱ

3.2 Interview Survey file conventions

For the Interview Survey, the files use the following conventions:

What does an Interview "quarter" refer to?
The interview "quarter" refers to the calendar quarter in which the interview occurred. For example, any CU interviewed in April, May, or June would have their data stored in the quarter 2 (YYQ2) datasets. During an interview, the CU is asked to report expenditures for the three months prior to the interview. So, for a CU interviewed in April, their expenditures in the YYQ2 are for January, February, and March. This distinction is important to remember when calculating calendar year estimates.
How many quarters does a CE PUMD release include?
Each CE PUMD release includes five quarters. Four quarters for the release year (YYQ1-YYQ4) and the first quarter of the next year.

Why do some CE PUMD Interview Survey files exist as part of two different data releases?
Each data release contains five quarters of Interview Survey data in order to allow users to calculate a calendar year estimate. For more information on this calculation, see Section 6.1 Estimation procedures for the Interview Survey.

Each annual data release of the CE PUMD is processed using new data and new disclosure avoidance guidelines. For quarters that appear in two different data releases, an "x" is added to the end of the file name. This "x" is used as an indicator to inform users that the two files were processed under a different set of rules and conditions and therefore the content may differ slightly. It is at the user's discretion as to which file to use.

Table 1: Description of "x" in file names for the fifth quarter
Is the "x" included?	What file and release?	Did the files, methods, or data change?	Example
No	Fifth file of previous year's release	No, they stayed the same as in the previous four quarters in the package.	FMLIYYQ.sas7bdat
Yes	First file of current year's release	Yes, they changed from the previous year's version.	FMLIYYQx.sas7bdat

What do the flag values in the Interview Survey represent?
In the Interview Survey files, data fields are explained by using flags for selected variables. Variables that have a flag variable associated with them are identified in the Dictionary for the Interview and Diary Surveys, on the Variable tab, under the column "Flag Name." Table 2 lists the codes for flags in the Interview Survey. Pre-1996 data will contain a subset of the flag values listed below.

Table 2: Interview Survey flag variable codes
Flag value	Description
A	Valid blank; a blank field where a response is not anticipated
B	Invalid blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU
C	Blank due to "Don't know," refusal, or other nonresponse
D	Valid value; unadjusted
E	Valid value; allocated
F	Valid value; imputed or adjusted in some other way
G	Valid value; allocated and imputed
H	Valid blank for an expenditure that is a "parent record" where the expenditure was allocated to other records and the original expenditure was overwritten with a blank
T	Valid value; topcoded or suppressed
U	Valid value; allocated then topcoded or suppressed
V	Valid value; imputed or adjusted in some other way then topcoded or suppressed
W	Valid value; allocated and imputed or adjusted in some other way then topcoded or suppressed

3.3 Interview Survey file types

Table 3 summarizes the Interview Survey files currently available. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.

Table 3: Interview Survey files and content
Name Content Variable periodicity^ⅲ Files per release Primary keys^ⅳ First year
FMLI
CU level Summary Expenditures Quarterly^ⅴ 5 NEWID 1980
CU level income, assets, and liabilities Annual
CU characteristics and weights NA
MTBI
Monthly expenditures Monthly 5 NEWID, SEQNO, ALCNO, UCC, RTYPE, EXPNAME, UCCSEQ, REF_MO, REF_YR 1980
MEMI
Member level income Annual 5 NEWID, MEMBNO 1980
Member characteristics NA
ITBI
Detailed income Monthly 5 NEWID, UCC, REF_MO, REFYR 1980
ITII
Imputed income iterations Monthly 5 NEWID, UCC, REF_MO, REFYR, IMPNUM 2004
NTAXI
Estimated federal and state income taxes Annual 5 NEWID and TAXID 2013 Q2
Detailed data files
Detailed expenditure and non-expenditure data Quarterly, monthly, weekly, or NA Varies by year* NEWID, SEQNO, ALCNO 1980
FPAR
Data related to the survey process NA 1 NEWID 2009
MCHI
Data related to the contact history NA 1 NEWID 2009
* For the specific detailed data files available, see PUMD dictionary.

Table 3: Interview Survey files and content
Name	Content	Variable periodicity^ⅲ	Files per release	Primary keys^ⅳ	First year
FMLI	CU level Summary Expenditures	Quarterly^ⅴ	5	NEWID	1980
CU level income, assets, and liabilities	Annual
CU characteristics and weights	NA
MTBI	Monthly expenditures	Monthly	5	NEWID, SEQNO, ALCNO, UCC, RTYPE, EXPNAME, UCCSEQ, REF_MO, REF_YR	1980
MEMI	Member level income	Annual	5	NEWID, MEMBNO	1980
Member characteristics	NA
ITBI	Detailed income	Monthly	5	NEWID, UCC, REF_MO, REFYR	1980
ITII	Imputed income iterations	Monthly	5	NEWID, UCC, REF_MO, REFYR, IMPNUM	2004
NTAXI	Estimated federal and state income taxes	Annual	5	NEWID and TAXID	2013 Q2
Detailed data files	Detailed expenditure and non-expenditure data	Quarterly, monthly, weekly, or NA	Varies by year*	NEWID, SEQNO, ALCNO	1980
FPAR	Data related to the survey process	NA	1	NEWID	2009
MCHI	Data related to the contact history	NA	1	NEWID	2009

3.3.1 Detailed data files - Detailed expenditure and non-expenditure data

The roughly 50 detailed data files include expenditure and non-expenditure information that is directly collected from sections of the Interview Survey (See the Survey materials page for more information). For years prior to 1994, there may be fewer files. The Dictionary for the Interview and Diary Surveys contains additional information related to the content and makeup for each of these files. Each detailed data file consist of five quarters of data. Because these files correspond to specific sections in the survey, they have a number of differences between them. These are the main differences:

The reference periods may differ due to different questions.
The number of records per CU differs. Some files having multiple records per CU, some have one record per CU, and some have no records per CU interviewed each quarter.
The method to identify unique records differs. Users can identify unique records with NEWID and depending on the file these variables:
- SEQNO is assigned sequentially during the interview as each expenditure record is recorded into the database.
- ALCNO is assigned sequentially for each record that has been allocated from one expenditure. For example, a CU may report spending $50 on a pair of men's pants and a shirt. The CE program will allocate out that record into two separate records, one for men's pants and shorts ($30) and one for men's shirts ($20).

Here is an example of the detailed data file VEQ (Vehicles, maintenance and repair) and some of the variables it contains.

VEQ-Vehicle maintenance and repair

VOPSERVY is an indicator variable that describes the type of maintenance or repair.
VOPMOA is an indicator variable for the month in which the expense occurred.
VOPEXPX is the total cost of the maintenance or repair expense.

3.3.2 Interview Survey Paradata files

Paradata files provide data about the interview process. Beginning in 2009, the CE program began releasing paradata for the Interview Survey. The CE program does not release paradata for the Diary Survey. Paradata are available in two datasets:

FPAR - Data related to the survey process

Contains data about the survey, including timing for each section and whether the respondent used records.
Organized by NEWID.
Unique records are defined by NEWID and QYEAR.

MCHI - Data related to the contact history

Contains data about the contact history between the field representative and the respondent, including reasons for interview refusal and time of contact.
The files are organized by NEWID.
Unique records are defined by NEWID and QYEAR.

How many quarters are in the paradata files?
Each paradata file has nine quarters. These include four quarters for the first year, four for the second year, and one for the first quarter of the third year.

Section 4. Diary Survey

4.1 Diary Survey overview

The Diary Survey is a panel survey in which approximately 5,000 addresses are contacted each calendar quarter that yield approximately 3,000 useable interviews.^ⅵ After a housing unit has been in the sample for two consecutive weeks, it is dropped from the survey, and a new address is selected to replace it. For more information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.

4.2 Diary Survey file conventions

For the Diary Survey, the files use the following conventions:

What does a Diary Survey "quarter" refer to?
A Diary Survey "quarter" refers to the calendar quarter in which the Diary Survey booklet was placed in the home of the CU by the Census Field Representative. All Diary Survey files are organized as quarterly files.
What does a Diary Survey "week" refer to?
The Diary Survey "week" refers to the 7 consecutive days in which the data were recorded. Respondents only record expenditures of that week. Each CU is in the sample for two consecutive weeks. Each Diary Survey week is assigned to the Diary Survey quarter in which it was recorded.

What do the flag values in the Diary Survey represent?
In the Diary Survey files, data fields are explained by using flags for selected variables. Variables that have a flag variable associated with them are identified in the Dictionary for the Interview and Diary Surveys, on the Variable tab, under the column "Flag Name." Table 4 lists the codes for flags in the Diary Survey. Pre-1996 data contain a subset of the flag values listed below.

Table 4: Diary Survey flags
Flag value	Description
A	Valid blank; a blank field where a response is not anticipated
B	Blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by the CU
C	Blank due to "Don't know," refusal, or other nonresponse
D	Valid value; unadjusted
E	Valid value; allocated
T	Valid value; topcoded or suppressed

For Diary Survey expenditures located on the EXPD files, the variable ALLOC can be utilized to determine if an expenditure has been adjusted, allocated, topcoded, or any combination of the three. Table 5 lists the allocation codes and its corresponding flag values.

Table 5: Diary Survey allocation codes
ALLOC Code	Description	Corresponding Flag
0	Valid value, unadjusted	D
1	Valid value, allocated	E
2	Topcoded and allocated	T
3	Topcoded, not allocated	T

4.3 Diary Survey file types

Table 6 summarizes the Diary Survey files currently available. Data prior to 1994 may include fewer files. If users encounter a file that is not listed below, consult the Dictionary for the Interview and Diary Surveys for additional details.

Table 6: Diary Survey files and content
Name Content Variable periodicity Files per release Primary keys^ⅶ First year
FMLD
Summary expenditures Weekly 4 NEWID 1980
CU level income, assets, and liabilities Annual
CU characteristics and weights Annual
MEMD
Member level income Annual 4 NEWID and MEMBNO 1980
Member characteristics NA
EXPD
Detailed expenditure and non-expenditure data Weekly 4 NEWID and ALLOC 1980
DTBD
Detailed income Annual 4 NEWID and UCC 1980
DTID
Income imputation iterations Annual 4 NEWID, UCC, IMPNUM 2004

Table 6: Diary Survey files and content
Name	Content	Variable periodicity	Files per release	Primary keys^ⅶ	First year
FMLD	Summary expenditures	Weekly	4	NEWID	1980
CU level income, assets, and liabilities	Annual
CU characteristics and weights	Annual
MEMD	Member level income	Annual	4	NEWID and MEMBNO	1980
Member characteristics	NA
EXPD	Detailed expenditure and non-expenditure data	Weekly	4	NEWID and ALLOC	1980
DTBD	Detailed income	Annual	4	NEWID and UCC	1980
DTID	Income imputation iterations	Annual	4	NEWID, UCC, IMPNUM	2004

Section 5. Sample code

This section provides sample code for CE PUMD. When using the code, users may want to consider these points:

The code can integrate data from both surveys or draw solely from one survey. Integration refers to the process of integrating estimates for both the Interview and Diary surveys. Separate or code options within the programs allow for users to decide which type of estimate they want.
The code may utilize the hierarchical groupings (zip), which are available from 1996 forward.
The code was built for use with the current PUMD structure and may require adjustments for earlier years, particularly for years before 1996.

The CE program provides these year sample codes:

Purpose	Weights	Period	Estimates	Notes	Code
Approximates CE table 1203 Income before taxes	Weighted	Calendar	Aggregate annual expenditures; expenditure means and standard errors by income groups	CE PUMD estimates may not match the table estimates. For more information, see FAQ 26 on the CE FAQ page.	SAS
Aggregates selected UCCs	Weighted	Calendar	Aggregate annual expenditures; expenditure means and standard errors	The code integrates data from both surveys on a UCC level. For a list of UCCs, see introduction of hierarchical grouping files.	SAS R STATA
Aggregates selected variables	Weighted and un-weighted	Collection and calendar	Computes all items above and regresses on imputed and non-imputed data.	Code uses the Balanced Repeated Replication (BRR) method in a predefined SAS proc. For more information, see SAS macro for the CE surveys.	SAS

Section 6. CE PUMD methodology

This section describes the estimation procedures for the Interview Survey and the estimation procedures for the Diary Survey; the formulas to estimate weighted annual calendar year estimates; and sampling statements. The CE program integrates information from both the Interview and Diary Surveys in its publications. Therefore any analysis limited to only the one survey may produce results that do not match the published CE estimates. In addition, users may find that estimates do not match the published estimates due to the non-disclosure criteria that are applied to the CE PUMD. For more information on non-disclosure requirements, see the Protection of Respondent Confidentiality page.

6.1 Estimation procedures for the Interview Survey

This section discusses procedures for estimating annual calendar year means with data from the interview surveys. Field representatives interview CUs to collect the cost of all expenses during the prior three months. Data collected by each interview are treated as statistically independent - each quarter's interview is separately weighted to be representative of the population. For more information, see the collections and data sources section in the Consumer Expenditures and Income chapter in the BLS Handbook of Methods.

For the Interview Survey, users may want to consider the following general concepts:

What information does the Interview Survey ask CUs?
The Interview Survey asks respondents about all expenses that the CU incurs during the survey period as well as information about financial data and demographic information. For more information on what is included and excluded in expenditures, see the entry on expenditures on the glossary page.
How are Interview Survey data organized?
The Interview Survey data are organized and identified by quarter, but particular files may provide data by month or by year. For more information on the data's periodicity, see Table 3: Interview Survey files and content.
How many quarters does a user need for calendar year estimates?
To produce calendar year estimates, users need to access all five quarters of data: All four quarters of the year of interest and the first quarter for the subsequent year. For example for estimates of 2017, users need the files for quarter 1 through 4 for 2017 and quarter 1 for 2018.
Why do users need data from two years to estimate one calendar year?
Data users need data from two subsequent years to calculate calendar year estimates because in the Interview Survey, users report expenditures for the three months prior to the interview. Thus in January, February, and March interviews, a CU has the potential to report expenditures from the previous year, which are considered out of scope when developing a current calendar year estimate.

When calculating data for 2016, interviews conducted in January 2017 cover expenditures made between October 2016 and December 2016, and are used to estimate data for these three months in 2016. Similarly, interviews conducted in March 2017 cover expenditures between December 2016 and February 2017 and are used to estimate data for December 2016. Thus, users have to use the first file for 2017 to estimate data for the last quarter of 2016. Charts 1 illustrates that concept. The green months show those that are in scope for the estimates of 2016 and the yellow months show those months in 2017 that out of scope.
Chart 1: Months in scope for quarter 5 (FMLI171)

A similar differentiation of scope happens at the beginning of the year. The data collected in January of 2016 are not in scope for 2016 expenditures because the January interview collects data for the last 3 months of 2015. However, data collected in February and March 2016 are partially in scope. Data collected in February includes data for January of 2016, and data collected in March 2016 includes data for January and February 2016. See chart 2.
Chart 2: Months in scope for quarter 1 (FMLI161)

Finally, for the months April through December all months are in scope. For example, quarter 2 interviews conducted in April, May, and June collect expenditure data for January 2016 through May 2016, which are all in scope for 2016. See chart 3. The same holds true for Quarter 3 and 4.
Chart 3: Months in scope for quarter 2 (FMLI162)
How much does a CU contribute to a calendar year estimate in each interview months?
A CU's contribution depends on the interview month and year. For information on how to identify a CU's contribution to a calendar year estimate, see Section 6.3 Formulas.
Is the periodicity of variable values consistent across files?
No, it is not. Different files and different variables within files may have different periodicities. For more information, see the Table 3: Interview Survey files and content.

6.2 Diary Survey estimation procedures

This section provides users of the Diary Survey with procedures to estimate annual calendar means.

CUs self-report a detailed description of all expenses using a product-oriented diary for two consecutive 1-week periods. Data entries can start on any day of the week. Data collected each week are treated as statistically independent - each week's diary is separately weighted to be representative of the population. For more information, see the collections and data sources section in the chapter of Consumer Expenditures and Income in the BLS Handbook of Methods.

For the Diary Survey, users may want to consider the following concepts:

What information does the Diary Survey ask CUs?
The Diary Survey asks for almost all expenses that the CU incurs during the survey week. In addition, the Diary Survey also asks about income and demographic information. The Diary Survey excludes expenses incurred by family members while away from home overnight or on vacation, and for credit and installment plan payments.
How are Diary Survey data organized?
The Diary Survey data are organized and identified by the day an item was purchased.
How do users identify the purchase date?
Users cannot identify the exact purchase date. However, users can identify the start month of the reference week (STRTMNTH), the day of the week (EXPNWDY), the sequential day of the survey (EXPNSQDY), as well as the reference month (EXPNMO) and year (EXPNYR).
How many quarters does a user need for annual calendar year estimates?
To produce calendar year estimates, users need to access four collection-quarter files of the year of interest. For example for 2017 estimates, users need the files for quarter 1 through 4 for 2017.
How much does a CU contribute to a calendar year estimate in each interview month?
In the Diary Survey, a CU contributes 100 percent of its expenditures to the calendar year. Unlike the Interview Survey, the Diary Survey has no lag between the time an expenditure occurs and the time it is reported, which means that the potential contribution of each CU to the mean is the same.

6.3 Formulas

The formulas described below can be used to calculate weighted estimates that use data from both surveys. The formulas calculate annual calendar year aggregates, averages, and standard errors for expenditures and reported income. While these formulas can also be used to calculate annual averages of imputed income as well, they cannot be used to calculate standard errors. For more information on this topic, see the Description of Income Imputation Beginning with 2004 Data.

What is the impact of different periodicity by different variables?
Variable periodicity can be annual, quarterly, monthly or weekly. When working with the PUMD, users need to take the particular periodicity into account. For example, when combining weekly data with annual data, users need to account for the difference in periodicity by inflating the weekly data to represent a quarterly value.
How to calculate comprehensive estimates of expenditures and income?
To gain a complete picture of expenditures and income, users need to integrate data from both surveys. The CE program collects data with two independent surveys. While they complement each other with respect to the data collection, they use independent samples that do not overlap. To see which UCCs the CE tables use to integrate and from which survey, see the Source Selection File.
How to integrate data from both surveys?
To integrate data from both surveys, users first need to choose which expenditures they would like to integrate. Some items are only collected in one survey while others are collected in both surveys. Once a user has established which UCCs they would like to integrate, users can then develop an estimate for each UCC individually. After estimates have been developed for each UCC individually, sum the results to develop an integrated estimate.

When integrating data across surveys, keep in mind that estimates created from the Diary Survey will yield a weekly amount and therefore, users will need to adjust their estimates so that each survey result represents the same time period. Inflating the Diary Survey UCC estimate by a multiplier of 13, will result in a quarterly amount, which can then be summed with an Interview Survey estimate.
How to calculate representative statistics?
Users can calculate representative statistics with the weight variable FINLWT21. This variable attributes a weight to each NEWID, which allows users to estimate values for the entire population. This variable is available in the FMLI and FMLD files.
Does the CE program provide sample code that uses the below formulas?
Yes, the CE program provides sample code with the same logic in SAS, STATA, and R on the PUMD documentation page.

6.3.1 Developing a weighted calendar year estimate

This section presents the methods to calculate the population, aggregate values, and average values for expenditures or income for a calendar year.

Denominator: Population
NEWID = Identifier for one CU for one quarter
FINLWT21 = Weight of each NEWID
QNUM = Number of quarters in the analysis (Usually equal to 4 for a 1 year estimate)
MO_SCOPE = Indicator for the number of months in scope for each NEWID

How do users calculate representative population weights (FINLWT21)?
To make the population weights representative of the U.S. population, data users need to generate two adjustment factors:
- "QNUM" adjusts the weights from annual to quarterly. The CE sample is designed to be representative of the entire annual U.S. population in the collection of each quarter. Thus, the weight (FINLWT21) needs to be divided by 4 to adjust for this fact. Without this adjustment the population in the denominator would be 4 times as large as the U.S. population. For example for an annual estimate (4 quarters) QNUM is 4.
- "MO_SCOPE/3" adjusts the weights for CUs that are out of scope. Interviews that were conducted in January, February, or March are not fully in scope. (This applies only to the interview survey. For more information, see Section 6.1 Estimation procedures for the Interview Survey.) For these months, only the part that is in scope should be used for representative population weights. MO_SCOPE adjusts the CU weights to the months in scope.
How to determine the value for MO_SCOPE?
The value for MO_SCOPE depends on the survey. For all four quarters within the Diary Survey, the value for MO_SCOPE is 3. For the Interview Survey, MO_SCOPE depends on the year and month of the interview. Users can identify the year using the FMLI variable QINTRVYR. For a description of what months are in scope, see Section 6.1 Estimation procedures for the Interview Survey.

For the first four quarters, MO_SCOPE is defined by the value of QINTRVMO:
- If QINTRVMO is 1 then MO_SCOPE is 0
- If QINTRVMO is 2 then MO_SCOPE is 1
- If QINTRVMO is 3 then MO_SCOPE is 2
- If QINTRVMO is 4-12 then MO_SCOPE is 3
For the fifth quarter, MO_SCOPE is defined by the value of QINTRVMO:
- If QINTRVMO is 1 then MO_SCOPE is 3
- If QINTRVMO is 2 then MO_SCOPE is 2
- If QINTRVMO is 3 then MO_SCOPE is 1
Numerator: Aggregate value
X = Expenditures or income variables by NEWID. This formula can be used for quarterly, annual, weekly, or monthly data.

Quotient: Average value

6.3.2 Reliability statement

Description of sampling and non-sampling errors
Sample surveys are subject to two types of errors, sampling and non-sampling. Sampling errors occur because observations are not taken from every unit in the entire population. Standard errors measure sampling errors. The primary purpose of standard errors is to provide users with a measure of the variability associated with the mean estimates. The sample estimate and its estimated standard error enable one to construct confidence intervals.

Non-sampling errors can be attributed to many sources, such as definitional difficulties, differences in the interpretation of questions, inability or unwillingness of the respondent to provide correct information, mistakes in recording or coding the data obtained, and other errors of collection, response, processing, coverage, and estimation of missing data. Estimates using a small number of observations are less reliable. Research articles examining CE measurement error and nonresponse bias are included in the CE library. The CE program regularly examines CE data in the annual data quality assessment and compares CE results with other sources of federal statistics. For more information, see the Data Quality and Comparisons page.

Estimating sampling error
The CE program estimates sampling error using Balanced Repeated Replication (BRR). The CE program implements this method with three steps:

Selects 44 subsamples that are balanced half-samples of the full sample.
Estimates a statistic for each half-sample, using the replicate weight variables WTREP01-WTREP44. The replicate weight variables contains a value greater than 0 for CUs assigned to that replicate and a value of missing for CUs not assigned to that replicate.
Estimates the variance between the values of the full-sample and half-samples with the standard formula for computing sample variances.

Replicate means for expenditures
WTREP = 44 Replicate weights (WTREP01-WTREP44)

Standard error

Note that prior to 1990, 20 replicate weights were used, instead of the 44 that are currently in use. When developing a standard error using data prior to 1990, use the replicate weight variables FINLWT01-FINLWT20 in your calculation.

Note that this method does not work for imputed income data. For information on calculating sampling errors from imputed income, see the User's Guide to Income Imputation in the CE.

6.4 Sampling statement

6.4.1 Survey sample design

The CE survey sample is a nationwide household survey representing the entire U.S. civilian noninstitutional population. It includes people living in houses, condominiums, apartments, and group quarters such as college dormitories. It excludes military personnel living overseas or on base, nursing home residents, and people in prisons. The civilian noninstitutional population represents more than 98 percent of the total U.S. population. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.

6.4.2 Weighting

Each CU included in the CE sample represents a given number of CUs in the U.S. population, which is considered to be the universe. Weighting is used to adjust the relative contribution of each CU to reflect the inverse of its selection probability, as well as to account for nonresponse and to match certain characteristics to known control totals. For more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook of Methods.

^ⅰ Telescoping errors refer to the temporal displacement of an event. Respondents of the CE surveys may perceive recent events to be more remote than they are (backwards telescoping) and distant events to be more recent than they are (forward telescoping).

^ⅱ Ian Elkin, Recommendation regarding the use of a CE bounding interview, 2013, Bureau of Labor Statistics.

^ⅲ Variable periodicity refers to the period that a given value represents.

^ⅳ Primary keys identify each unique record in the database.

^ⅴ Quarterly summary expenditures are presented as two variables - one containing expenditures made in the previous calendar quarter and one containing expenditures made in the current calendar quarter.

^ⅵ For more information on the number of contacted addresses and completed interviews, see the CE Data Quality Profile.

^ⅶ Primary keys identify each unique record in the database.

Last Modified Date: January 16, 2020