Consumer Expenditure Survey Microdata Users’ Workshop and Survey Methods Symposium, 2014
The 2014 Consumer Expenditure Survey Microdata Users’ Workshop and Survey Methods Symposium included presentations from BLS and non-BLS economists and researchers.
The Consumer Expenditure Survey (CE) is the most detailed source of expenditures, demographics, and income collected by the federal government. Every year, the Bureau of Labor Statistics (BLS) CE program releases microdata on the CE website from its two component surveys (the Quarterly Interview Survey and the Diary Survey), which are used by researchers in a variety of fields, including academia, government, market research, and other private industry areas.1
In July 2006, the CE program office conducted the first in a series of annual workshops to help users better understand the structure of the CE microdata; provide training in the uses of the surveys; and, through presentations by current users and interactive forums, promote awareness of the different ways the data are used and explore possibilities for collaboration.
Starting in 2012, the program office added an additional day to the event for a symposium to explore topics in survey methods research in support of Gemini, a major project to redesign the CE survey (more information here: www.bls.gov/cex/geminiproject.htm). In addition to the CE program staff, workshop speakers have included economists from BLS regional offices and researchers not affiliated with the BLS; similarly, symposium speakers have included CE program staff, other BLS National Office staff, and speakers from outside the BLS. This report describes the survey methods symposium, which took place on July 15, 2014, and the most recent workshop, which took place July 16–18, 2014.
Survey methods symposium
As in previous years, the goals of the 2014 CE Survey Methods Symposium were 1) to provide an overview of CE Redesign initiatives, including results from the Web Diary Feasibility test, as well as outline current initiatives related to the Gemini Survey Redesign project and 2) to feature projects associated with the redesign, including improving recall through life events questions, reducing proxy reporting, and tracking measurement error. There were two sessions, one focusing on each topic.
Results from Testing a Web Mode for the Consumer Expenditure Diary Survey. Ian Elkin (CE) provided an overview of the Web Diary Feasibility Test and report, specifically highlighting relevant findings and noting how those findings were being incorporated into CE’s next feasibility test, the Individual Diaries Feasibility Test.
Positive findings included higher median expenditures for certain diary sections, lower item nonresponse, as defined for a diary instrument, and that the week-to-week drop-off of expenditures validates the redesign’s shift to a one-week collection period. Web Diary inputs leading into an improved Individual Diaries Feasibility Test included sampling areas with high levels of internet penetration; streamlined and efficient instrument design, specifically for those accessing the instrument through a mobile device; straightforward, yet comprehensive, respondent materials; in-depth field representative training with hands-on access to test instruments; additional follow-up procedures to reduce item nonresponse; continued paradata monitoring to determine respondent with a propensity for nonresponse; and modified receipt/recall procedures to ensure timely collection of data at diary pickup. Similar to the preliminary findings presented at last year’s symposium, the results from the Web Diary Feasibility Test analysis reinforced the potential benefits of a multimodal approach; however, the results also highlighted that additional refinement is necessary to fully realize the potential of such an approach.
CE’s Gemini Redesign: Updates on Development, Implementation, and Evaluation. As a continuation of the work presented at the 2013 CE Survey Methods Symposium, Laura Erhard (CE) provided an overview of the multiyear Gemini Project, focusing on the objectives that are the underpinning of the entire redesign process. Redesign objectives guiding the current research agenda include a verifiable reduction in measurement error with a particular focus on underreporting, no harmful effect on response rates, a neutral budgetary impact, and a secondary objective of reducing respondent burden.
To meet these objectives, the information-gathering process started by reaching out to CE users to determine what aspects of the survey and subsequent data are most critical. The objectives of gauging user impact were to obtain feedback from users on any serious issues and to educate users on the structure and content of the new CE redesign proposal and on the timeline for implementation. Outreach activities included a redesign of the Gemini website; presentations at conferences, meetings, and webinars; Federal Data users meetings; a notice in the Federal Register; and a user’s impact survey. The results of these activities included generally positive feedback, particularly regarding the focus on improved data quality. However, concerns were expressed including limiting the detail collected in the survey, an inability to capture expenditure with longer transaction times, seasonality issues at a micro level, and an inability to use four consecutive quarters for analysis.
In addition to an overview of the Gemini Project, Erhard provided an overview of the upcoming Proof of Concept (POC) test. The POC test is designed to ensure that the basic underlying structure and components of the new design are feasible as well as being designed to mirror the proposed redesign to the fullest extent possible. Results from the POC test will be used to determine methodological issues such as respondent willingness to complete all survey components, operational factors such as logistical issues with incentives, and experiential factors involving both respondents and field representatives.
The presentation concluded with a look ahead at future Gemini Redesign-related projects and field tests.
Research projects associated with the redesign
“Happy Birthday”: Life Events as Reminders of Spending. Brett McBride (CE) presented on using the aided recall method of inquiring about life events as a reminder of spending. The factors that motivated this study included improving CE data quality by limiting underreporting caused by recall error, examining aided recall methods that might prompt recall of expenses that typical questions do not adequately prompt, and to validate a previous study that suggested some association of events with reporting expenses.
The aided recall method of using life events as a reminder of spending did cue reporting from Consumer Units (CU) with events associated with expenditure, such as a wedding or birthday party, and these CUs reported significantly more in total expenditures than CUs with no reported events. However, CUs with life events with no expenses associated with them did not report more expenditures than those with no life events reported, and there was no significant association found between the number of events reported and total expenditures. These findings led to the conclusion that what respondents consider salient expense events in their recent past are not being omitted when they respond to the CE Quarterly Survey and that use of life events may be more effective as a bonding technique rather than a cuing technique.
Asking Questions about Household Members to Improve Proxy Reporting. Erica Yu (OSMR) presented on improving proxy reporting through asking questions about household members. Inherently, proxy reporting has advantages (a single respondent provides information about others) and disadvantages (proxy information is typically of a lower quality than self-reported information). Subsequently, possible methods for improving proxy reporting were examined in order to develop a standardized protocol that can be used in a production setting; these methods were to remind respondents to consider others, to cue respondents to recall others’ actual events rather than rely on dispositions, and to cue respondents to recall out-of-the-ordinary deviations from typical behaviors.
Yu’s research indicated that a protocol of questions and probes can improve reporting, but probes at the time of reporting should cue respondents with as much detail as possible to encourage them to retrieve memories and should ask about others’ hobbies, what others spend their money on, and changes to day-to-day routines.
Measurement Error in CE: Monitoring the Quality of the Estimates. The final symposium presentation, by Roger Tourangeau (Westat), focused on monitoring specific measures that can be used on an ongoing basis to track measurement error in the CE Survey and recommended a multi-method-indicators approach that consists of three main categories: external indicators, internal indicators, and a comparison of CE production estimates with “gold standard” interviews. It is precisely because no one approach is perfect that Tourangeau proposes coming at the issue by tracking measurement error from several angles, noting that a multi-angle approach will provide a more comprehensive picture of the CE quality.
An external indicator approach is based on the comparison of external data sources, such as comparing CE estimates with other sources such as the Personal Consumption Expenditures from NIPA (National Income and Product Accounts). For this approach to be successful, it relies on a certain set of criteria for the external data sources including comparability, consistency, ease of producing a comparable estimate, timeliness of the benchmark, and the comprehensiveness of the estimate. Internal indicators are based solely on CE data or information about the data collection process, such as a comparison of the Interview Survey and the Diary Survey. Internal indicators should be robust, easy to interpret, and based on a similar metric to the external indicators. The final approach is to compare CE reports to “gold standard” interviews comparing ratios in the internal and external indicators for a given commodity category similar to those from the “gold standard.” Tourangeau notes that four factors are critical for a successful “gold standard” interview including:
- Incentives for records collection or diary keeping
- Other inducements for encouraging record keeping
- Length of the reference period (burden versus stability of estimates)
- Selection of commodity categories
Tourangeau recommended that CE build on past efforts and develop a time series with multiple indicators, keeping in mind that no approach is perfect.
Conclusions. Similar to past symposiums, the 2014 CE Survey Methods Symposium was a successful event focusing on the most recent actions CE has taken to redesign its surveys as well as emphasizing forthcoming approaches that are being considered for testing and possible implementation.
The central conclusion that can be drawn from the presentations and discussions from the Survey Methods Symposium is that, similar to most if not all redesign efforts, there is a measure of uncertainty regarding which tested method will resonate with respondents while maintaining/improving data quality, and the CE should ensure that the overall redesign process examines as many methodological options as feasible.
Microdata users’ workshop
Day one. The first day of the 2014 workshop opened with presenters from the CE program. Ryan Pfirrmann-Powell provided an overview of the CE, featuring topics such as how the data are collected and published. Scott Curtin then presented an introduction to the microdata, including an explanation of their features, such as data file structure and variable naming conventions.
Following a break, the morning concluded with presentations by researchers not affiliated with the CE program who have used the microdata for a variety of purposes (Andres Torrubia and Sheng Guo). The afternoon was dedicated to practical training, in which attendees had the opportunity to perform exercises using microdata. The day concluded with an information-sharing group session among workshop participants and CE program staff. This was an open forum in which attendees met informally to discuss their research and offered suggestions for improving the microdata. Because the practical training is progressive, until 2011 this session was held on the second day to maximize overlap in attendance between newer and more experienced users. However, in response to comments from attendees at prior workshops, in 2012 the session was scheduled for the first day of the workshop and successfully repeated in this order in 2013 and 2014.
Day two. The second day opened with more advanced topics, with Brian Nix of the BLS Division of Price Statistical Methods presenting technical details about sampling methods and construction of sample weights, and Evan Hubener (CE program) speaking on imputation and the allocation of expenditure data in the CE.
Following a research presentation entitled “The Effect of Casinos on Household Consumption” (Li Zhang), the technical instruction resumed with a topic of perennial interest to CE microdata users: how to apply longitudinal weights to the interview data. As noted in the introduction by Bill Passero (CE program), the Interview Survey collects data from respondents for 4 consecutive calendar quarters. During each interview, the respondent is asked to provide information on expenditures for various items during the past 3 months. However, not all participants remain in the sample for all four of these interviews. Those who do remain have different characteristics (e.g., higher rates of homeownership and average age) than those who do not. Therefore, attempting to analyze average annual expenditures by only examining respondents who participate for all four interviews yields biased results. Following the Passero presentation, the workshop pivoted to a session explaining an important feature of certain variables in the microdata: topcoding. In a presentation entitled “Balancing Respondent Confidentiality and Data User Needs,” Aaron Cobet (CE program) explained that, in order to preserve the confidentiality of the data, values for some variables, such as income sources and certain expenditures (e.g., rent, among others) are topcoded. In this process, values that exceed a predetermined critical value are replaced with a new value. In each case, changed values are flagged for user identification.2 However, topcoding by its very nature affects data quality, in some cases substantially. Daniel Yang (BLS Office of Survey Methods Research) presented work he coauthored with Daniell Toth (same office) entitled “Statistical Distance Measurements for CE Data Disclosure Utility Evaluation,” in which he described how topcoding of income affects various analyses, especially regression results.
After a break for lunch, Carol Boyd Leon and Charlotte Irby, technical writer-editors of the Monthly Labor Review (MLR), described the publication process, from submission to printing, for authors interested in having their work appear in the MLR. Following this, external researcher Jane Yoo presented her work entitled “The Role of Inter Vivos Giving in General Equilibrium.” The afternoon concluded with additional practical training.
Day three. On the final day, CE staff featured advanced topics, starting with Neil Tseng explaining how sales taxes are applied to expenditure reports during the data production process. Next, Geoffrey Paulin described the correct use of imputed income data and sample weights in computing population estimates. The latter session noted that the proper use of weights requires a special technique to account for sample design effects that, if not employed, result in estimates of variances and regression parameters that are incorrect.3 Researcher Jonathan Peters followed, with a discussion of his research comparing CE data to Highway User Fee data with a focus on tolls in the New York and New Jersey area. After a break, Ting Yan presented the final research presentation, in which she described response burden: specifically, what factors predict it, and how it affects data quality. The session concluded with a “sneak peek” of developments for CE microdata by Steve Henderson. Of particular note was the announcement that starting with the release of the 2013 data (September 9, 2014), the CE will be using the National Bureau of Economic Research TAXSIM program to estimate and publish federal and state income taxes, which testing shows will yield a significant improvement in the quality of these data.4 Further ahead, he noted, are many changes to the Interview Survey in 2015. These include new health care questions, the dropping of the first interview or “bounding survey,”5 and a redesigned sample.6 Regarding publication, he also noted that detailed data tables, currently only available on request, will be published online at the all-consumer-unit level, as will historical data for 1972–1973. At the time of the presentation, final dates had not yet been announced.
After a lunch break, practical training continued, including a presentation of a computer program available with the microdata for use in computing 1) correct standard errors for means and regression results when using unweighted non-imputed data; 2) population-weighted non-imputed data; and 3) multiple imputed income data, both unweighted and population weighted (Paulin). Finally, attendees were debriefed in a feedback session designed to solicit opinion on how to improve future workshops, CE program outreach, and other topics of interest to attendees.
2015 Symposium and workshop
The next Survey Methods Symposium will be held July 14, 2015, once again concomitant with the next microdata users’ workshop (July 15–17). While the symposium and workshop will remain free of charge to all participants, advance registration is required. For more information about these and previous events, visit the CE website (www.bls.gov/cex) and look for “Annual Workshop” under the left navigation bar titled “PUBLIC-USE MICRODATA.” For direct access to this information, the link is www.bls.gov/cex/csxannualworkshop.htm. Additional details about the 2014 symposium are available at www.bls.gov/cex/geminimaterials.htm.
Highlights of workshop presentations
Following are highlights of the papers presented during the workshop, listed in the order of presentation. They are based on summaries written by the respective authors.
Andres Torrubia, CEO, Fixr, Inc., “Visualizing transportation, health care, and utilities costs for renter C[onsumer] U[nit]s” (Interview and Diary Surveys), day one.
We used the adaptation to the "R" language of the program "Integrated Mean and SE.sas" (provided in the examples) available here: https://github.com/ajdamico/asdfree/tree/master. We modified the "R" program as per the methodology above; however, the "R" program was unbearably slow (30 minutes or more for each run) and we had to use a RAM disk to speed it up to 10 minutes approximately.
Sheng Guo, Florida International University, “Financial Wealth, Housing Wealth and Housing Dividends” (Interview Survey), day one.
For a household, home ownership provides necessary shelter, potential investment returns associated with property appreciation and a hedge against changes associated with housing expenditures. In addition to potential appreciation, individual households benefit over time from a housing dividend defined as the difference between the market rent for the individual household’s housing unit and the household’s actual expenditures for housing. The purchase of a house substantially fixes a household’s recurring housing expenditures and generates an implied housing dividend that increases with ownership tenure. This dividend to home ownership is documented using pooled, cross-year samples from the Consumer Expenditure Survey [CE]. The housing dividend delivers a non-trivial effect on household expenditures after controlling for housing value, housing equity, financial assets and income.
Li Zhang, Ph.D. candidate, University of Virginia, “The Effect of Casinos on Household Consumption” (Interview Survey), day two.
I study the effect of casino openings on household consumption. Using Consumer Expenditure Survey [CE] data during 2001–2011, I show that an average household increases quarterly non-gambling spending by $440 in response to the appearance of casinos in its state of residency. The higher non-gambling spending, together with other evidence that casino openings also induce higher gambling spending, implies that casino gambling is crowding out future consumption by decreasing household savings. My finding contrasts with the crowd-out effect of lottery spending on current consumption documented in the literature. I use the self-reported gambling data to estimate the effect of casino openings on consumption conditional on household gambling-participation status. Since gambling participation is substantially under-reported in the CE, I plan to employ an instrument-like variable (ILV), the distance from a household to the nearest casino, to solve the under-reporting problem. To measure the distance, I applied for the access to the confidential [CE] dataset, which contains the county codes of individual household residence
Jane Yoo, Ph.D., Assistant Professor, Ajou University, South Korea. “The role of inter vivos giving in general equilibrium” (Interview and Diary Surveys), day two.
This paper presents an overlapping-generations model for studying the role of inter vivos giving in general equilibrium. The empirical part of the paper is composed by two parts; finding the difference in household wealth and income between a gift-recipient and a non-recipient, and measuring the size and the effect of intergenerational transfer in a household budget. The first part is studied with the Survey of Consumer Finance (1998–2010). For the latter, I used the [CE] data. The size of inter vivos giving is matched with that of IRS public data. After calibrating parameters, which are related to gift taxes, I studied the long-run effect of lowering gift tax rates. The results show that there is welfare gain in the steady state while the wealth inequality is not severely increased.
Jonathan Peters, Ph.D., The City University of New York. “Measurement of Road User Charges in the United States – Comparison of BLS CE Data and Highway User Fee Data” (Interview and Diary Surveys), day three.
The BLS utilizes a number of key measurement tools to identify consumer spending patterns and the cost of goods. One recent area of interest in the area of transportation is road pricing. Road pricing is the charging of fees to utilize areas or sections of the road network. Various forms are proposed and in place, including congestion pricing, time of day pricing, vehicle miles traveled charging and various types of tolling. These changes in road pricing are profound. Where in the past the vast bulk of transportation fees and taxes were collected on fuel or vehicle registrations, today the BLS and others face a range of charging mechanism that are captured in the Expenditures Universal Classification Code 520541 – Tolls or electronic toll passes.
The authors look to explore the BLS USS 520541 data and compare it with a number of sources that are available in the area of road pricing to help calibrate and support the collection of data on road pricing. In particular, we seek to examine other survey and electronic toll records to estimate the cost of road pricing by geography. Our research team has an extensive data archive on tolling systems and we are currently geocoding the data to line up with the BLS data. Given that tolling costs are reported to be relatively low in the average household’s expenditures, the geographic concentration of tolling and heavy burden on certain households is of particular interest to urban planners and policy staff.
Ting Yan, Ph.D., University of Michigan. “Response burden: What predicts it and what is the impact on data quality?” (Interview Survey), day three.
Objectives of this research:
- What causes response burden?
- What is the impact of response burden on data quality?
- What can be done to reduce or counteract negative effects of response burden?
What causes response burden?
- Respondent motivation has strong impact on response burden
- Task characteristics have weaker impact on response burden
- Recruitment efforts have no impact on response burden
- Perception of survey task has strong impact on response burden
- Respondents who reported “very burdensome” exhibited worse response behaviors and produced data of worse quality
- Removing these cases
- Doesn’t seem to change mean estimates
- Doesn’t seem to change conclusions from regression models
- Could result in cost savings in terms of
- Number of contact attempts saved
- Number of production hours saved
- Number of items to be edited and/or imputed reduced
Staff of the CE Program
Crain, Vera. Economist, Branch of Information and Analysis (BIA); days one, two and three. Workshop leader and session coordinator.
Cobet, Aaron. Economist, BIA; day two
Curtin, Scott. Supervisory Economist, Chief, Microdata Section, BIA; day one
Henderson, Steve. Supervisory Economist, Chief, BIA; day three
Hubener, Evan. Economist, Branch of Production and Control (P&C); day two
Johnson-Cox, Barbara. Economist, P&C; day three
Passero, Bill. Supervisory Economist, Chief, Processing and Analysis Section, BIA; day two
Paulin, Geoffrey. Senior Economist, BIA; day three
Pfirrmann-Powell, Ryan. Economist, formerly with BIA; days one and two
Other BLS speakers
Boyd Leon, Carol. Technical Writer-Editor, Monthly Labor Review Branch; day two
Irby, Charlotte. Technical Writer-Editor, Monthly Labor Review Branch; day two
Nix, Brian. Mathematical Statistician, Division of Price Statistical Methods; day two
Yang, Daniel. Research Mathematical Statistician. Office of Survey Methods Research; day two
Speakers from outside BLS:
Guo, Sheng, “Financial Wealth, Housing Wealth and Housing Dividends” (Interview Survey), day one.
Peters, Jonathan, “Measurement of Road User Charges in the United States – Comparison of BLS CES Data and Highway User Fee Data” (Interview and Diary Surveys), day three.
Torrubia, Andres, “Visualizing transportation, health care, and utilities costs for renter C[onsumer ]U[nit]s” (Interview and Diary Surveys), day one.
Yan, Ting, “Response burden: What predicts it and what is the impact on data quality?” (Interview Survey), day three.
Yoo, Jane, “The role of inter vivos giving in general equilibrium” (Interview and Diary Surveys), day two.
Zhang, Li, “The Effect of Casinos on Household Consumption” (Interview Survey), day two.
Ian Elkin and Geoffrey D. Paulin, "Consumer Expenditure Survey Microdata Users’ Workshop and Survey Methods Symposium, 2014," Monthly Labor Review, U.S. Bureau of Labor Statistics, July 2015, https://doi.org/10.21916/mlr.2015.25.
1 The Quarterly Interview Survey is designed to collect data on expenditures for big-ticket items (e.g., major appliances, cars, and trucks) and recurring items (e.g., payments for rent, mortgage, or insurance). In the Interview Survey, participants are visited once every 3 months for 5 consecutive quarters. Data from the first interview are collected only for bounding purposes and are not published.
In the Diary Survey, participants record expenditures daily for 2 consecutive weeks. The survey is designed to collect expenditures for small-ticket and frequently purchased items, such as detailed types of food (e.g., white bread, ground beef, butter, lettuce).
The CE microdata may be downloaded on the CE website (https://www.bls.gov/cex/pumd.htm).
2 Details about topcoding are provided in the public-use microdata documentation for the year of interest. (See, for example, Consumer Expenditure Interview Survey, Public Use Microdata, 2013 User’s Documentation, September 10, 2014, https://www.bls.gov/cex/2013/csxintvw.pdf.)
3 The CE sample design is pseudorandom. The proper use of weights requires the use of the method of balanced repeated replication.
4 These data have since been published, and the expected outcome has been confirmed. See “Estimating Taxes in the Consumer Expenditure Survey,” by Geoffrey Paulin and William Hawk, Monthly Labor Review, https://www.bls.gov/opub/mlr/2015/article/pdf/improving-data-quality-in-ce-with-taxsim.pdf.
5 The purpose of the bounding survey is to ensure that consumers interviewed more than once do not report expenditures in subsequent interviews for which data have already been collected. As an example, if a respondent in the first interview reports purchase of a refrigerator for $500, and does so once again in the second interview, the interviewer can make sure that the second-interview report is indeed a new refrigerator, different from the one reported 3 months earlier in the bounding survey.
6 The sample redesign occurs decennially, when certain cities or other areas enter the sample, and others leave, based on changes in population or other factors.