Consumer Expenditure Survey Methods Symposium and Microdata Users’ Workshop, July 16–19, 2019
The Consumer Expenditure Surveys (CE) program collects expenditures, demographics, and income data from families and households. The CE program held its annual Survey Methods Symposium and Microdata Users’ Workshop from July 16 to 19, 2019, to address CE-related topics in survey methods research, to provide free training in the structure and uses of the CE microdata, and to explore possibilities for collaboration. Economists from the CE program, staff from other U.S. Bureau of Labor Statistics offices, and research experts in a variety of fields—including academia, government, and private industry—gathered together to explore better ways to collect CE data and to learn how to use the microdata once they are produced.
The Consumer Expenditure Surveys (CE) are the most detailed source of data on expenditures, demographics, and income that the federal government collects directly from families and households (or, more precisely, “consumer units”).1 In addition to publishing standard expenditure tables twice a year, the U.S. Bureau of Labor Statistics (BLS) CE program releases annual microdata on the CE website from its two component surveys (the Quarterly Interview Survey and the Diary Survey).2 Researchers use these data in a variety of fields, including academia, government, market research, and various private industry areas.
In July 2006, the CE program office conducted the first in a series of annual workshops in order to achieve three goals: (1) to help users better understand the structure of the CE microdata; (2) to provide training in the uses of the surveys; and (3) to promote awareness, through presentations by current users and interactive forums, of the different ways in which the data are used, and thus provide opportunities to explore collaboration. In 2009, the workshop expanded from 2 days to 3 days to include presentations from data users not affiliated with BLS. This allowed users to showcase their experiences with the public use microdata (PUMD) files (, to discuss problems and successes using the data, and to seek comment and guidance from CE program staff in completing their work.
Starting in 2012, the program office has preceded the workshop with a 1-day symposium to explore topics in survey methods research in support of the CE Gemini Redesign Project (Gemini Project), a major initiative to redesign the CE (for more information, go to https://www.bls.gov/cex/geminiproject.htm).
In addition to the CE program staff, workshop speakers have included economists from BLS regional offices and researchers not affiliated with BLS. Similarly, symposium speakers have included CE program staff, other BLS National Office staff, and speakers from outside BLS. This article describes the 2019 Survey Methods Symposium, conducted on the afternoon of July 16, 2019, and the 2019 Microdata Users’ Workshop, conducted July 17–19, 2019, at the BLS National Office in Washington, D.C.
Survey methods symposium
The 2019 Consumer Expenditure (CE) Survey Methods Symposium focused on the challenges and opportunities facing BLS household surveys and followed a slightly different format than that used in the 2018 Symposium.3 Staff members from four BLS surveys participated in the symposium: the Current Population Survey (CPS), American Time Use Survey (ATUS), National Longitudinal Surveys (NLS), and CE. The symposium focused on three research topics: (1) incentive delivery methods and experiments, (2) assessment of data quality, and (3) alternate modes of data collection. The CE program office invited representatives from the other BLS household surveys to share information about their existing methods and experiences regarding these research topics. The goals of the symposium were (1) to share CE research findings with stakeholders, survey researchers, and data users and (2) to promote a discussion about common challenges and solutions facing CE and other household surveys.
The afternoon was divided into three sessions, each centered on one of the three research topics. In each session, three presenters discussed the major challenges their survey was facing related to the topic and how the surveys are working to address them. At the end of each session, a discussant provided a commentary on the session and moderated a discussion about the topic and the presentations, encouraging presenters and attendees to engage in dialogue.
In 2019, the symposium drew 42 attendees from areas including universities, academic programs in survey methodology, nonprofit organizations, private companies, and federal agencies. In the following research topic sections, a review of the presentations are given, followed by a discussion of the combined key takeaways.
The symposium started with an introduction to the CE redesign by Dr. Parvati Krishnamurty from the CE program at BLS. The presentation outlined the original plan for the redesign and recent modifications made to the redesign plan prior to implementation into production. The redesign plan, which was intended to be implemented as a whole, was found to not be budget neutral on an ongoing production basis.4 Therefore, the plan was modified to move to a phased implementation of key design elements into the CE surveys. The phased implementation plan is to retain the design elements that have been shown to be effective, including a streamlined questionnaire with less expenditure detail, a records focus (including a targeted incentive for record use), online diaries, and token incentives. These elements will be implemented directly into the CE Diary and Interview Surveys conditional on results from testing, which is currently underway. Other design elements such as a single sample design, two-interview structure, and two-wave design could be tested and implemented in future years, pending changes to requirements or funding availability.
Session 1: Incentive delivery methods and experiments
The first session was on incentive delivery methods and incentive experiments. Declining response rates continue to be an ongoing challenge for all household surveys. One of the strategies developed to address this problem is the use of incentives. In this session, representatives from three BLS-sponsored surveys shared information about their recent or forthcoming large-scale incentive experiments.
Incentives and the American Time Use Survey: experimenting with cash, Rachel Krantz-Kent (ATUS). Ms. Krantz-Kent provided highlights of a planned ATUS incentive experiment that will be fielded beginning in the fall of 2019, if approved by the Office of Management and Budget. The ATUS is a time-use retrospective Computer Assisted Telephone Interview diary survey that selects its sample from households that have completed the 8th month of the CPS.5 ATUS sends $40 debit card incentives only to cases without phone accessibility for which either no phone number is available or that have nonworking phone numbers during the first week in sample. In addition to the debit card, these cases are sent a brochure about the survey and a letter that includes an appeal to call an interviewer and complete the survey.
Research has shown that prepaid, upfront incentives are more effective than promised incentives, such as the ATUS debit card.6 This finding, along with the costliness of using debit card incentives, has provided motivation for the proposed incentive study. In the study, cases without phone numbers will be assigned to one of three groups: a control group that receives no incentive, a group that receives a $5 cash incentive, and a group that receives a $10 cash incentive. Cases with nonworking phone numbers will be sent $5 cash incentives. All groups will be sent a brochure about the survey and a letter with an appeal to call an ATUS interviewer and complete the interview. With funds saved by using cash rather than debit card incentives, ATUS also plans to test the use of cash incentives in motivating 15- to 24-year-olds to complete the ATUS. People in this age range have had the lowest response rates to the ATUS in recent years, and they are showing the biggest declines in response rates over time.
The primary goals of the incentive experiment are to examine if prepaid unconditional cash incentives are a more effective incentive than promised debit cards in the ATUS, and if the use of $5 or $10 cash incentives can boost response rates among 15- to 24-year-olds. The experiment will comprise one year of data collection with results becoming available in mid-2021.
Innovations in incentives in the NLS, Donna Rothstein (NLS). The NLS consist of three active cohorts—the National Longitudinal Survey of Youth 1979 (NLSY79), the NLSY79 Child and Young Adult (biological children of women in the NLSY79), and the National Longitudinal Survey of Youth 1997 (NLSY97). Each longitudinal survey gathers detailed information about the respondents’ labor market activities and life experiences over time.7 Respondents are offered incentives to help secure their cooperation. Incentives are one lever for maintaining high NLS response rates; also important are respondent materials, interim contact, and interview length and content. In her presentation, Dr. Rothstein described the incentive structure for the most recent round of the NLSY79, which began fielding in the fall of 2018, and was conducted primarily by phone. The structure included a mix of incentives, which depended on the phase of data collection: (1) a base incentive of $70 for a completed interview, with an extra $20 to $40 if the respondent missed recent prior rounds, (2) an early bird incentive fee of an additional $30 over the base incentive for respondents who call in and complete the interview early, and (3) a final push incentive later in the fielding period that is either (a) standard $20 to encourage more reluctant sample members to participate or (b) enhanced-$40 to recruit more difficult-to-get subgroups who are underrepresented in the fielding response rates. Recently, the NLSY79 has begun offering electronic methods to deliver incentives, such as PayPal and online mobile banking (Chase QuickPay, for example). As of mid-June of the NLSY79 fielding, about one-third of the incentive payments had been made electronically. Some benefits of electronic payments compared with issuing checks include lower costs, faster payments to respondents, and the ease of tracking the payments and resolving any payment issues.
Experimenting with monetary incentives in the CE Survey, Barry Steinberg (CE). In 2016, CE Interview Survey staff conducted an incentive experiment to test a variety of incentives, both conditional and unconditional. Respondents were assigned to three different experimental groups based on whether they got some combination of a token $5 incentive, $40 debit card for completion, and a $20 debit card for using records. There was a fourth group that was the control group and received no incentives.
The results indicated increases in response rates for all groups in the first interview. The effect persisted in the second interview but had diminished by the third interview. However, the authors saw no significant increase in expenditures reported, minimal decrease in number of contact attempts, some increase in records use, a decrease in refusal conversion, and significant differences in feeling burdened depending on whether or not the respondents received incentives. There were some operational issues with the experiment, including 30 to 40 percent of respondents reporting not receiving the token incentive sent to them by U.S. mail, and some respondents not activating the debit cards. While incentives were effective in increasing response rates slightly, they were not found to be cost effective overall. Therefore, only two kinds of incentives are being considered for inclusion in the surveys: token incentives, which could be implemented at a very low cost, and targeted record-use incentives, which might improve data quality by increasing record use by respondents.
Session 2: Assessment of data quality
Data quality is an important consideration in all surveys, and the notion of data quality includes both the accuracy of the data or responses and the representativeness of the sample. This is a particular concern for household surveys facing reluctant respondents and lack of access to certain demographic groups. In this session, staff from three BLS household surveys presented their recent research on data quality.
Electronically mediated employment: designing new questions and assessment of data quality, Karen Kosanovich and Roxanna Edwards (CPS). Ms. Kosanovich and Ms. Edwards described the experiment of adding four questions to the May 2017 Contingent Worker Supplement (CWS) of the CPS in order to measure electronically mediated work. BLS has periodically collected data on contingent and alternative employment arrangements in the CWS. These data addressed people in temporary jobs and alternative arrangements like independent contractors, on-call workers, temporary help agency workers, and workers provided by contract firms. The survey was fielded in 1995, 1997, 1999, 2001, and 2005. In early 2016, BLS obtained funding to field the CWS in May 2017 and subsequently investigated the possibility of collecting data about work arrangements that have emerged since the last time the CWS was fielded. BLS added four questions to the May 2017 CWS to measure an emerging type of work—electronically mediated work, defined as short jobs or tasks that workers find through websites or mobile apps that both connect them with customers and arrange payment for the tasks. This was the first attempt by BLS to measure this emerging type of employment arrangement.
After extensive review, BLS determined that these questions did not work as intended and had a large number of incorrect “yes” answers to the new questions. To eliminate these false positives, BLS manually recoded the data using verbatim responses available only on the confidential microdata file. Using these recoded values, BLS estimated that electronically mediated workers accounted for 1.0 percent of total employment in May 2017. In the interest of transparency, BLS released both the collected data and the recoded data. BLS is currently working with the Committee on National Statistics to studyand has no plans to collect data on electronically mediated work using the questions added to the May 2017 supplement.
How different are NE cases and the ATUS sample: a consideration of demographic factors and data quality, David Biagas (OSMR). Dr. Biagas shared findings from his study comparing the characteristics of noneligible (NE) cases with eligible sample cases in the ATUS. NE cases make up about 3 to 4 percent of all cases and are not eligible for use in estimation. They are defined by several criteria relevant to the ATUS 24-hour diary day: (1) reporting less than five activities (labeled as “inactive”), (2) reporting more than 3 hours of time as “don’t know” (labeled for analysis as “forgetful”), or (3) reporting more than 3 hours of time as “refusal” (labeled as “reluctant”). Although these three subgroups are not mutually exclusive, the overlap between them was found to be relatively small. Compared with the ATUS-eligible cases, the NE cases were more likely to be persons who were older, female, black, widowed, out of the labor force, lower income, from smaller households, and had less than a high school diploma. Comparisons among the three subgroups indicated that the “forgetful” subgroup was driving these differences in characteristics between the NE cases and the ATUS-eligible cases. It was also found that among the three subgroups, the “forgetful” subgroup reported earlier in the field period and provided better quality data. ATUS may consider further evaluation of the criteria used to flag NE cases in the future.
Exploring the characteristics of partial interviews in the Consumer Expenditure Survey, Laura Erhard (CE). The CE Interview Survey has two kinds of partial interviews: (1) sufficient partials, where the respondent provides all expenditures but does not complete the income or assets sections; and (2) insufficient partials, where respondents start the survey but drop out at various points during the survey. Using audit trail data, Ms. Erhard identified 294 cases in the 2017 data that are noninterviews, with demographic information and total interview time spent in the expenditure section greater than 65 seconds. Comparing completed interviews with insufficient partial interviews, Ms. Erhard found only small demographic differences between the households. The insufficient partial interview cases are younger and have larger households, and a higher proportion are from the Northeast region. Contrary to what we might expect, the insufficient partial interview households did not necessarily drop out after wave 1; in fact, 13 percent completed other waves. Since these cases break off during the interview, she also examined where the breakoffs take place and found that 44.9 percent of breakoffs occur after the owned homes section. Most partials break off around 15–20 minutes into the interview, but it is not clear if this is because respondents lose interest or the reasons are related to the content of sections. Comparing partial and nonrespondents, the partials report higher values for monthly rent, owned property, and utility bills. However, the data for partial interviews are sparse, which limits their usefulness for processing.
Session 3: Alternate modes of data collection
As response rates decline and costs of fielding in-person surveys increase, surveys are looking at alternate modes of data collection. Online data collection is being explored by many household surveys, and phone data collection has increased in many surveys that previously relied on in-person data collection. The three presentations by BLS surveys in this session discuss attempts to evaluate or test the feasibility of online or phone data collection and the associated tradeoffs.
Field testing the CE Online Diary, Parvati Krishnamurty (CE). Dr. Krishnamurty described the CE plans to field test online diaries. The CE Diary Survey uses two 1-week paper diaries to collect information on day-to-day expenses from households. The online diary development predates the CE redesign effort. Theoretically, the online diary provides a way for respondents to enter information contemporaneously, thus improving data quality. Early tests only had a desktop diary; mobile capabilities were added in later versions. The current version of the online diary is device-optimized for display on desktop/laptop or mobile screens. Over time, the design has moved from personal diaries to one diarykeeper per household, since prior tests indicated a lack of improvement of data quality and difficulties with implementing personal diaries.
The Large Scale Feasibility Test (LSF), a field test of online diaries for the Diary Survey, is being fielded in October 2019 through April 2020. The LSF will build on information learned from prior tests and will have sufficient sample to make statistical inferences. The LSF will answer questions about how best to operationalize the online diary and assess any differences in data quality from the online diaries compared with the current paper diaries. The LSF will have a starting sample of 2,500 addresses. Respondents will receive an advance postcard, and half the sample will receive a $5 cash incentive with the advance letter by priority mail. Based on eligibility for the online diary, respondents will be encouraged to keep online diaries rather than being offered a mode choice.
There will be two in-person visits to each household: one to place the diaries and train respondents to use them, and one to complete the diary process and enter any expenses that the respondent forgot to enter during the diary-keeping period. For the online diary we chose a web-based diary rather than an app, as we were concerned that respondents would be unwilling to load an app onto their mobile phones. The diary is based on the online diary built by Westat for an earlier test, which has been updated by Census for the LSF. Dr. Krishnamurty went over some of the functionality of the online diary with screenshots of how to change passwords and user names; enter, save, or delete expenses; and search for expenses. The next steps will be data analysis that will inform our decision on whether to implement the online diary in 2021. Further refinements of the online diary design will be made on the basis of LSF and online testing.
Factors to consider for web-based collection of time diary data in the ATUS, Rose Woods (ATUS). Ms. Woods described the ATUS consideration of offering a web-based mode of data collection to improve survey response. The core part of the ATUS is a time diary in which respondents report the activities they did on the previous day; in addition, interviewers ask a number of contextual questions about the activities to ensure they can be accurately coded. BLS hired Westat to review the literature on web-based collection in time use surveys to propose a survey design for the ATUS that includes a self-administered online collection mode and to provide specific recommendations for a layout, probes, and technological needs for an online diary. BLS had questions and concerns about the level of activity detail that could be collected in an online diary and the potential implications of online collection on respondent burden, data quality and comparability, and ATUS products.
Change in primary mode from in-person to phone interviews in the NLSY79 and NLSY97, Holly Olson (NLS). Ms. Olson shared the experiences of the NLSY switch from in-person to phone data collection. The primary reason for this change in primary mode was cost. However, there was also concern about overall survey response rate, sample retention, and sample representation. Evaluating the change of mode was complicated by the fact that the NLSY97 had moved to biennial administration right before the mode change, and the change was sudden. Some design considerations for the phone data collection instrument were that the respondent environment when answering the survey was unknown, there were no visual aids to assist the respondent, and it was more difficult for the respondent to retain a long string of information. In addition, since this is a longitudinal survey, there was concern over how existing survey respondents familiar with the in-person mode would transition to the phone mode. The lessons learned from this NLS experience were (1) to allow sufficient time for in-person outreach to get in touch with the sample unit; (2) the importance of midfield flexibility; and (3) to ensure that interviewers “owned” the cases they worked.
Summary of symposium discussion
The CE program office is grateful to the presenters who shared their experiences in dealing with the major challenges and solutions facing household surveys. A selection of the key takeaways from those discussions follows:
- Each incentive delivery method has its pros and cons. Debit cards are more expensive to set up, and there are operational challenges associated with debit card use, including respondents not receiving the cards or not being able to activate them. Electronic payments are easy to use and inexpensive to set up, but such payments are more prevalent among younger respondents. Even within one agency, incentives can look very different on different surveys and incentive methods need to keep up with rapid technology changes and new methods of payment.
- Incentives can also be used to reduce the burden on the survey program by encouraging early response, inbound calling, appointment setting, and behaviors thought to improve data quality, like using records to answer questions. The presentations indicate that incentives can be successfully targeted to underrepresented groups (as in the ATUS and NLS) or to target a specific behavior of consulting records (as in the CE).
- In the data quality session, discussion focused on the different aspects of data quality. In the CE and CPS (CWS) presentations, the focus was on accuracy, while the ATUS and CE presentations emphasized representativeness. Thresholds for which cases to include as an interview seem to matter and need to be examined further, as we heard in the CE and ATUS presentations. The key insight from this session was that assessing data quality is important. It needs to be done when planning to collect data (e.g., capture paradata), before collecting data (e.g., test question wordings), while collecting data (e.g., listen to interviews), and after data collection (e.g., dig deep into the metrics).
- In the session on alternate modes of data collection, there was some discussion on the tradeoffs of going online. While the CE is currently testing an online instrument in the field, ATUS has not developed an online instrument because there are concerns about the limitations of online data collection such as higher breakoffs, inability to prompt respondents, and difficulties with collecting complex data.
- In response to declining response rates and the higher costs of in-person data collection, many surveys including the CE are finding it necessary to employ more phone-based data collection. On the basis of NLSY’s experience of switching from a mainly in-person survey to a phone survey, an important takeaway is that changing mode requires careful planning and redesign of the questions and should not just be done in a hurry as a cost-cutting measure, if possible.
Microdata users’ workshop
Meet with an expert: Beginning with the 2017 workshop, the CE organizers have included a feature called the “Meet with an expert” program. The purpose of the program is to provide an opportunity for attendees to have in-depth, one-on-one meetings with members of the CE staff, wherein the attendees can ask questions and receive comments and other guidance about the projects in which they are engaged.8
The program has proven beneficial to attendees, and also to CE staff, who learn more about how researchers are using the data, and about factors related to data, documentation, etc., that can be improved. This success continued at the 2019 workshop, and, as a result, the program is being continued for the 2020 Microdata Users’ Workshop. Attendees are able (and encouraged) to arrange meetings via registration form, email, or onsite form.
The first session of the 2019 workshop consisted of presenters from the CE program. After welcoming remarks by Branch of Information and Analysis (BIA) Chief Steve Henderson, Program Manager Adam Safir provided an overview of the CE, featuring topics including how the data are collected and published. Economist Jimmy Choi (BIA) then presented an introduction to the microdata, including how they can be used in research and the types of documentation about them available to users. Economist Taylor Wilson (BIA) completed the session with a description of data file structure and variable naming conventions.
Afterwards, attendees received their first practical training with the data. In this session, they learned basic data manipulation, including how to compute means from the microdata for consumer units with different characteristics (e.g., by number of children present).
To start the afternoon sessions, Senior Economist Aaron Cobet (BIA) explained the need to balance confidentiality concerns of respondents with usefulness of the data to researchers. Because U.S. Code Title 13 requires confidentiality of response, information that might potentially identify specific respondents must be removed from the CE data before they are released publicly. Some identifiers are direct, such as names and addresses. Others are not direct, such as extremely high expenditures or make and model of automobile(s) owned.
Mr. Cobet explained the methods used to produce the CE microdata files to address these disclosure concerns. The first method, called “topcoding,” involves reported values for income or expenditures that exceed a certain threshold, called the “critical value.” These values are replaced by an average of all values exceeding this threshold and then “flagged” as topcoded (or “bottom-coded,” in the case of large income losses).9 He also explained “recoding,” in which data are either made less precise (e.g., if the owned automobile was produced in 1999, the year is replaced with the decade of manufacture [1990s in this example]) or changed in another way (e.g., state of residence is changed to a nearby state) to preserve both comparability and confidentiality.
Mr. Cobet next explained suppression, in which reported values are removed from the data set. In some cases, only specific information is suppressed on a record (e.g., details of a specialized mortgage). In other cases, the entire record is removed (e.g., report of a purchase of an airplane).10 Finally, Mr. Cobet talked about methods to eliminate “reverse engineering,” a process through which the user could deduce protected information from other information provided in the publicly available files.11
Following this presentation, practical training resumed with a project designed to obtain sample means based on detailed data on educational expenditures derived from various files.12 Attendees also learned how to integrate results from the Interview and Diary Surveys to match expenditure categories in CE published tables.
The afternoon continued with one invited speaker. Arcenis Rojas (GRI, Inc.), formerly an economist in BIA, presented an overview of an R package he wrote to facilitate use of the CE PUMD files for analysts who use the R software. Among other features, the package facilitates the downloading of CE PUMD from within R, annualizes expenditures for either Diary or Interview expenditures, integrates Interview and Diary data as necessary (for those who want to match published tables), and calculates weighted CE quantiles.
Presentations from researchers not affiliated with the CE program completed the afternoon activities. Summaries of the papers presented by outside researchers are included at the end of this Conference Report.
The first speaker, doctoral student Kennan Cepa (Sociology, University of Pennsylvania), spoke about her use of CE microdata to study expenditures on higher education across the income distribution.
The second speaker in this session was Dr. Monika Hu, former recipient of the American Statistical Association (ASA)/BLS/National Science Foundation (NSF) Fellowship and currently a professor at Vassar College. Dr. Hu described her use of CE data in teaching undergraduate statistics.
The final speaker in this session was Dr. Jonathan Peters of the College of Staten Island and The City University of New York (CUNY) Graduate School, a veteran presenter at three prior CE workshops (2014, 2017, and 2018). Dr. Peters compared demographics and expenditure patterns for families with and without certain types of transportation expenditures, such as gasoline and taxi usage.
Following Dr. Peters’ presentation, the afternoon concluded with a networking opportunity for attendees. The event was an informal gathering to allow them to meet each other and to initiate or renew contacts with staff of the CE program.13
The second day opened with more advanced topics. First, economist Brett Creech (BIA) delivered a presentation, new to the workshop, which explained the role of source selection in the publication process. In standard tables published by the CE program, some expenditure data come from the Diary Survey and some data come from the Interview Survey. Sometimes, the source of the expenditure is unique to one survey or the other. For example, detailed food expenditures, such as apples or bananas, are collected only in the Diary Survey. Similarly, expenditures on trips of a certain length or duration are collected only in the Interview Survey. But some items, such as certain types of apparel, are collected in both. He explained how the CE program selects the source for those expenditures in these cases. He showed attendees a table available online (https://www.bls.gov/cex/ce_source_integrate.xlsx) that identifies which survey serves as the source for detailed publication items.
During the Q&A session for this presentation, Dr. Geoffrey Paulin, a senior economist in the CE program (BIA), noted that the table is useful for users interested in computing summary variables at different levels than are available in the microdata files. For example, for users’ convenience, the Diary PUMD files include a summary variable comprising all fresh fruit expenditures collected in the CE (apples; bananas; oranges; citrus fruits, excluding oranges; and other fresh fruits). But a user interested in total expenditures for all citrus fruits must identify the appropriate items (oranges; and citrus fruits, excluding oranges) and then aggregate the expenditures by a unique code identifying them. The table provides that code, so the user need not search other documentation to find this information.
Next, statistician Brian Nix of the BLS Division of Price Statistical Methods presented technical details about sampling methods and construction of sample weights.
The concluding presentation of this section featured economist Taylor Wilson (BIA) presenting the introduction of experimental weights for estimating state-level expenditures with the use of the CE microdata. He noted that weights for New Jersey, California, Florida, New York, and Texas were available (https://www.bls.gov/cex/csxresearchtables.htm#stateweights).14 Mr. Wilson also presented the criteria used by the CE division to assess the feasibility of devising weights for other states.
Following the opening session, Dr. Paulin described the correct use of sample weights in computing consumer unit population estimates. Following a break after the opening session, he noted that the proper use of weights requires a special technique to account for sample design effects that, if not used, results in estimates of variances and regression parameters that are incorrect. He also mentioned a topic of perennial interest to CE microdata users: caveats concerning the use of data only from respondents who complete all four interviews of the Interview Survey.15 This led into a practical training session devoted to computing weighted results in two projects: one related to computing results for collection year estimates and the other for calendar year estimates. The distinction is that collection year refers to the date on which the respondent reported the expenditures to the interviewer while calendar year refers to the period in which they actually occurred. For example, for a person participating in the Interview Survey in January 2018 who reports expenditures that occurred during the final 3 months of 2017 (October, November, or December), the expenditure collection year is 2018, while the expenditure calendar year is 2017.
Upon completion of this presentation, practical training resumed. Attendees learned how to obtain information on nonexpenditure characteristics, such as type of school attended, associated with certain educational expenditures, using detailed PUMD files.16
Attendees also received an introduction to the procedures needed to get consumer-unit-population weighted averages for expenditures; that is, instead of computing mean expenditures from the sample itself, how to apply weights to estimate mean expenditures for the consumer unit population as a whole.17
Following this training, Dr. Lyubov Kurkalova, professor of agricultural economics at North Carolina A&T State University, presented her work on using CE data to study food deserts in North Carolina. The work examines expenditures for both food at home (generally, food purchased at grocery stores and similar outlets) and food away from home (generally, food purchased from restaurants, employer cafeterias, and similar outlets) by demographic groups.18 Dr. Kurkalova is part of a team that is working toward a predictive model of food consumption patterns as functions of these demographics, income, and other data not available in the CE, such as food prices. Of note, one of Dr. Kurkalova’s coauthors, Dr. Kathleen Liang (North Carolina A&T State University), attended the 2017 workshop and recommended that Dr. Kurkalova present at the 2019 event.
After a break, Charlotte Irby, BLS technical writer-editor of the Monthly Labor Review (MLR), described the MLR publication process, from submission to posting, for authors interested in having their work appear in the MLR.
After additional practical training, the afternoon concluded with a special presentation and commentary by two CE staff members. Julie Sullivan, an economist with the Branch of Production and Control (P&C), delivered a presentation on research she is conducting into expenditure patterns of married couples with children of different ages in which at least one spouse works full time. The presentation was unusual because the work was based on internal files rather than the PUMD files that the non-BLS presenters used.
For this reason, Dr. Paulin provided commentary on the presentation, focusing on how non-BLS researchers who wished to pursue similar research could do so using the PUMD files. As part of this presentation, Dr. Paulin compared and contrasted the data files, noting that PUMD users have some advantages over internal data file users. For example, in the case of married couples, much demographic information (age, occupation, etc.) is available for both spouses on one file in the PUMD. For internal users, only the information about the reference person is available on the equivalent file, with spouse’s information on another. The files must be merged to obtain a single set with the required information. Dr. Paulin also described summary variables that are available to PUMD users but have to be computed for internal file users. The main advantage of the internal files is the lack of topcoding, suppression, or other procedures. Obtaining access to these files requires special clearance, available through the BLS Visiting Researcher Program, which is not administered by the CE staff, as described further in a presentation on day three by the CE liaison Jimmy Choi.
The final day started with CE staff discussing advanced topics. First, economist Barbara Johnson-Cox (P&C) explained how sales taxes are applied to expenditure reports during the data production process. Then, economist Clayton Knappenberger (P&C) spoke about imputation and allocation of expenditure data in the CE.
Following a break, a panel moderated by Dr. Paulin addressed programs for researchers who have access to the internal CE files. The first panelist, Dr. Monika Hu, described her experiences as recipient of the ASA/BLS/NSF Fellowship, including tips she had for those interested in applying for the program. Dr. Hu’s research involves the use of synthetic data for geographic variables, such as small cities or counties that are not currently available to users of the CE PUMD files; and for highly skewed variables, such as the CU income, which is currently topcoded in the CE PUMD files. Next, Dr. Paulin read a statement from Dr. Hu’s predecessor in this program, Dr. Kelly McConville, a professor at Reed College. Dr. McConville investigated methods of imputing assets and liabilities in CE during her tenure in this position. Her written statement was similar to what Dr. Hu stated, in that it described her experiences and provided tips for potential applicants. Concluding the panel, Jimmy Choi (BIA), the CE liaison to the BLS Visiting Researcher Program, explained the application process. He also provided tips on applying.
In each case, the advice of the speakers was to plan well and to contact CE staff before starting the application process. This is to ensure that the data of interest are available internally, that the proposal is of interest to the CE program, and that it is written so that the goals, and tasks for achieving them, are clear to the acceptance committee, including CE staff. Proposals should be free from jargon, unexplained specialized terms, and unnecessary abbreviations or acronyms.
The final presentations of the workshop followed the panel. The first of these was a “sneak peek” of developments for CE publications and microdata. Starting with those recently implemented, BIA Chief Steve Henderson noted the addition of two new states to the list of state weights (New York and Texas), as described by Taylor Wilson on day two, as well as the addition of a new question in 2019, which asks whether anyone in the consumer unit has previously served in the U.S. military. This question, which supplements a current question asking whether anyone in the consumer unit is currently serving in the U.S. military, is now asked in response to requests from different federal agencies regarding the economic status of U.S. veterans. He also described several forthcoming developments,19 including that the CE will publish data at more refined geographic levels (census division in addition to current census region) and a new column on the (still new) generational tables (first published officially to reflect 2016 data) showing expenditures for the “post-Millennial” generation, expected in 2020.20 Of particular interest to PUMD users, he noted that the series of free data is expected to expand. At the time of the workshop (and through the start of 2020), data from 1996 onward were available for free download on the PUMD website ( ), while data prior to 1996 were available for purchase. However, as of January 16, 2020, data from 1980 forward are now available for free download.
Continuing the “sneak peek” theme, Dr. Paulin described work in progress within the CE program to impute data for assets owned and liabilities owed when the holding, but not specific, value of either is reported.
To conclude the morning session, Dr. Erica Yu (BLS) led the workshop attendees in a feedback session. In the session, attendees had the opportunity to provide comments on what they found most (or least) useful about the workshop, and to make suggestions for future events. Many comments were positive, with attendees liking the progressive nature of the workshop (i.e., starting with the most basic information about the data collection and file structures and moving to the most technical topics at the end) and complimenting the aforementioned “Meet with an expert” program. Workshop attendees also provided thoughtful comments on what could be improved, much of which is administrative and addressable. The team is either implementing the suggestions now or considering the best ways to do so.
The final training session was devoted to the computation of means, standard errors, and regression parameter estimates when using multiply imputed data, such as income in the CE. In addition, those interested received an instruction manual for use of a computer program for SAS software users that is available with the microdata. This program helps CE microdata users compute correct standard errors for means and regression results easily when using (1) unweighted nonimputed data, (2) population-weighted nonimputed data, and (3) multiply imputed income data, both unweighted and population weighted. Finally, a few attendees took one last opportunity to meet with an expert at this year’s workshop.
Symposium and workshop of 2020
The next Survey Methods Symposium is scheduled for July 21, 2020, in conjunction with the 15th annual Microdata Users’ Workshop (July 22–24), at the BLS National Office in Washington, D.C.. Although the symposium and workshop remain free of charge to all participants, advance registration is required (https://data.bls.gov/forms/cex-registration.htm). For more information about these and previous events, visit the CE website (https://www.bls.gov/cex/) and look for the left navigation bar, titled “CE WORKSHOP AND SYMPOSIUM.” For direct access to this information, the link is https://www.bls.gov/cex/csxannualworkshop.htm. Links to the agendas for the 2019 workshop (https://www.bls.gov/cex/ce-2019-workshop-agenda.pdf) and the 2019 symposium (https://www.bls.gov/cex/ce-2019-symposium-agenda.pdf) are also available on this web page. Both agendas include links to presentations delivered at the respective events.
Highlights of workshop presentations
The following are highlights of the papers presented during the workshop, listed in the order of presentation. They are based on summaries written by the respective authors.
Kennan Cepa, doctoral student in sociology, University of Pennsylvania, “In What Contexts do Households Maintain their Advantage? Mechanisms for Widening Household Expenditures on Higher Education across the Income Distribution” (Interview Survey), day one.
The goal of my research is to examine how state-level variation in postsecondary systems shapes household expenditures on higher education. To do this, I connect the state variable in the CE data to state-level higher education information from the Integrated Postsecondary Education Data System (IPEDS), and I specifically examine state-level differences in the degree of differentiation (i.e., public vs. private and 2-year vs. 4-year institutions) and privatization (i.e., publicly vs. privately funded institutions) in higher education systems across states. I contend that these state-level differences shape how much families across the income gradient spend on their children’s postsecondary ventures, with implications for inequality and households’ financial well-being.
Jingchen (Monika) Hu, Ph.D., professor of mathematics and statistics, Vassar College, “Using CE Microdata in Undergraduate Statistics Courses” (Interview Survey), day one.
I have been using samples of the CE microdata in undergraduate statistics courses. Most recently, I have used samples of the 2017 Q1 PUMD data sets in teaching topics in a Bayesian statistics course at Vassar College, including topics of Bayesian inference for a mean, Gibbs sampler and Markov chain Monte Carlo (often called “MCMC”) estimate, and Bayesian linear regression. I have been using mostly the total expenditure, total income, rural status, and race of reference person variables.
Jonathan Peters, Ph.D., professor of finance, College of Staten Island and CUNY Graduate School, “Examining Changes in Households Spending Patterns in Response to Changes in Transportation Usage and Transportation Unit Costs.” (Interview and Diary Surveys), day one.
This paper looks to explore variations in household consumption on transportation services and the impact of these costs on other household consumption categories. As a major household expense—consuming roughly 17 percent of household income—transportation costs are particularly significant for low- and moderate-income households. We look to understand how changes in transportation costs by expense type impact the other components of household consumption for various types of households. For example, we look to explore how households that face costs related to road tolls alter their consumption baskets as compared with others in their region who travel by other modes. As a second case, we would like to understand how households that have high for-hire vehicle use (e.g., taxi services) compare with other households in terms of other mobility and consumption categories. Finally, we would like to examine “crowding out” (or the indirect income effect) in general—the condition where an increase in a tax, fee, price, or charge forces the households to alter their consumption basket and reduce costs in other areas to compensate for the higher costs in other areas. We believe that the radical shifts in the spending and usage in the transportation categories over the last 20 years will allow us to identify the impact of crowding out in various household spending categories.
Lyubov Kurkalova, Ph.D., professor of agricultural economics, North Carolina A&T State University, “Opportunities and Challenges of using the CE Microdata to Study Food Deserts in North Carolina” (Diary Survey), day two.
Food deserts have become a serious issue in rural and urban communities. North Carolina has capacity to produce a large variety of vegetables, but recent research suggests that most households have challenges in accessing fresh vegetables living in a food desert. The study aims to understand how the expenditures on fresh vegetables at grocery stores, and the share of vegetables in total food expenditures, have been changing over time. We also access similar changes for the food that was purchased away from home. Both groups of trends are quantified by demographic groups and for the state as a whole. Using the CE data overlapping with other data, we gain insight into the challenges and gaps in the understanding of food desert issues, and how demographic information relates to vegetable availability and consumption pattern changes over time.
Julie Sullivan, economist, CE Program (P&C), “Comparing Selected Expenditures of Dual and Single Income Households with Children” (Interview and Diary Surveys), day two.
The percentage of dual-income households has been on the rise since the 1960s and surpassed the percentage of father-only employed households in the 1980s. The rise is most likely a result of a cultural shift regarding women in the workforce. It is important to monitor and analyze this trend, as the expenditure habits of dual-income families are characteristically different than those of single-income households, and therefore are expected to influence the U.S. economy as they become even more common.
In this work, I estimate the proportions of couples-led households with children younger than 18 that have dual-earner income versus a single-earner income. I also examine how this now-dominant choice of lifestyle affects household income and expenditures, with a particular emphasis on spending for food away from home (e.g., purchased at restaurants), food at home (e.g., purchased at grocery stores), transportation (public transportation and gas), education, and childcare. These expenditure categories are selected for analysis because it is hypothesized that working full time entails tradeoffs with time for meal preparation and for child rearing, as well as higher transportation expenses from two commuters. Finally, I compare income and outlay amounts across the different groups. I use both Interview and Diary Survey data from 2015 to 2017 to analyze the selected expenditures.
Jingchen (Monika) Hu, Ph.D., professor of mathematics and statistics, Vassar College, “Exploring Synthetic County Labels in the Consumer Expenditure Survey Microdata” (Interview Survey), day three.
As an ASA/NSF/BLS Fellow in 2018, I collaborated with statisticians and economists at BLS on two projects. First, I worked on simulating synthetic county labels of consumer units for release consideration in the CE PUMD. Releasing synthetic county labels can expand the accessibility of such variables in the CE PUMD while maintaining low levels of disclosure risks. Second, I experimented with simulating synthetic CU income to replace the topcoding procedure in the CE PUMD to provide a better balance of data utility and disclosure risks. In addition, I shared my thoughts and tips on applying to the ASA/NSF/BLS Fellowship Program.
Staff of the CE program
Choi, Jimmy. Economist, Branch of Information and Analysis (BIA); days one and three
Cobet, Aaron. Senior Economist, BIA; day one
Creech, Brett. Economist, BIA; day two
Curtin, Scott. Supervisory Economist, Chief, Public Use Microdata Production Section, BIA; practical training sessions; days one, two, and three
Henderson, Steve. Supervisory Economist, Chief, BIA; days one and three
Johnson-Cox, Barbara. Economist, Branch of Production and Control (P&C); day three
Knappenberger, Clayton. Economist, P&C; day three
Paulin, Geoffrey. Senior Economist, BIA; introducer of speakers, commentator, and practical trainer; days one, two, and three
Safir, Adam. Chief, Division of Consumer Expenditure Surveys; day one
Sullivan, Julie. Economist, P&C; day two
Wilson, Taylor. Economist, BIA; days one and two
Other BLS speakers
Nix, Brian. Mathematical Statistician, Division of Price Statistical Methods; day two
Irby, Charlotte. Technical Writer-Editor, Monthly Labor Review; day two
Yu, Erica. Research Psychologist, Office of Survey Methods Research; day three
Cepa, Kennan. Doctoral Student in Sociology, University of Pennsylvania, “In What Contexts do Households Maintain their Advantage? Mechanisms for Widening Household Expenditures on Higher Education across the Income Distribution” (Interview Survey); day one. First-time attendee and presenter (2019).
Hu, Dr. Jingchen “Monika” (Ph.D.). Professor of Mathematics and Statistics, Vassar College, “Using CE Microdata in Undergraduate Statistics” (Interview Survey); day one. “Exploring Synthetic County Labels in the Consumer Expenditure Survey Microdata” (Interview Survey); day three. First-time attendee and presenter (2019).
Kurkalova, Dr. Lyubov (Ph.D.). Professor of Agricultural Economics, North Carolina A&T State University, “Opportunities and Challenges of using the CE Microdata to Study Food Deserts in North Carolina” (Diary Survey); day two. First-time attendee and presenter (2019).
McConville, Dr. Kelly (Ph.D.). Assistant Professor of Statistics, Reed College. Written remarks on her participation in the ASA/BLS/NSF Fellowship program delivered on her behalf by Dr. Geoffrey Paulin (Ph.D.) of the CE staff (BIA); day three.
Peters, Dr. Jonathan (Ph.D.). Professor of Finance, College of Staten Island and CUNY Graduate School /Research Fellow, University Transportation Research Center, “Just What Do We Actually Know about Household Spending on Transportation Services and How Are They Changing in the 21st Century” (Interview and Diary Surveys); day one. Prior presenter (2014, 2017, and 2018); returning presenter (2019).
Rojas, Arcenis. Data scientist, GRI Inc., “Using R with CE Microdata” (Interview and Diary Surveys); day one. Former attendee and presenter as part of BIA staff (2015 through 2018).
Geoffrey D. Paulin and Parvati Krishnamurty, "Consumer Expenditure Survey Methods Symposium and Microdata Users’ Workshop, July 16–19, 2019," Monthly Labor Review, U.S. Bureau of Labor Statistics, April 2020, https://doi.org/10.21916/mlr.2020.5.
1 Although a household refers to all people who live together in the same living quarters, “consumer unit” refers to the people living therein who are a family, or others who share in specific financial arrangements. For example, two roommates living in an apartment constitute one household. However, if they are financially independent, they each constitute separate consumer units within the household. Similarly, although families are related by blood, marriage, or legal arrangement, unmarried partners who live together and pool income to make joint expenditure decisions constitute one consumer unit within the household. For a complete definition, see the CE glossary at https://www.bls.gov/cex/csxgloss.htm. For more information on households and families, see https://www.census.gov/topics/families/data.html.
2 The Quarterly Interview Survey is designed to collect data on expenditures for big-ticket items (e.g., major appliances or automobiles) and recurring items (e.g., payments for rent, mortgage, or insurance). In the Interview Survey, participants are visited once every 3 months for four consecutive quarters. In the Diary Survey, on the other hand, participants record expenditures daily for 2 consecutive weeks. This survey is designed to collect expenditures for small-ticket and frequently purchased items, such as detailed types of food (e.g., white bread, ground beef, butter, or lettuce). The CE microdata for both surveys may be downloaded from the CE website at .
Data from the Diary and Interview Surveys are published twice a year in various standard tables. One set describes expenditures that occurred within the calendar year of interest (e.g., January through December 2018 for the most recent set available as of the writing of this report). The other set provides a midyear update to expenditures, ranging from July of the earlier year to June of the later year (e.g., July 2017 through June 2018 for the most recent set available as of the writing of this report). The single-year series is available from 1984 forward. The midyear updates are available from July 2011 to June 2012 onward. Each set includes information on expenditures by age of reference person, composition of consumer unit, income of consumer unit, and other demographics. For a complete list, see https://www.bls.gov/cex/tables.htm.
3 For example, the 2018 Symposium comprised one full day, while the 2019 Symposium was one afternoon only. For more information on the 2018 Symposium, see Geoffrey Paulin and Parvati Krishnamurty, “Consumer Expenditure Survey Methods Symposium and Microdata Users’ Workshop, July 17–20, 2018,” Monthly Labor Review, May 2019, available at https://www.bls.gov/opub/mlr/2019/article/pdf/consumer-expenditure-surveys-methods-symposium-and-microdata-users-workshop-2018.pdf.
4 In 2018, CE was working on a separate Interagency Agreement with Census to proceed with a large-scale feasibility test of the entire redesign. Cost estimates indicated that there were insufficient funds to proceed with such a test, and the plans for the LSF were scaled back to focus on the online diary component of Gemini only. This experience also made it clear that the redesign plan, once implemented, would no longer be budget neutral, as it was originally estimated to be when the redesign was first announced, due to the increase in costs since then.
5 For more information on ATUS, see .
6 See A. Mercer, A. Caporaso, D. Cantor, and R. Townsend, “Monetary Incentives and Response Rates in Household Surveys: How much gets you how much?” Public Opinion Quarterly, vol. 79, no. 1 (Spring 2015): 105–129; R. Curtin, E. Singer, and S. Presser, “Incentives in Random Digit Dial Telephone Surveys: A Replication and Extension,” Journal of Official Statistics, vol. 23, no. 1 (2007): 91–105; and E. Singer, J. Van Hoewyk, N. Gebler, T. Raghunathan, and K. McGonagle, “The Effect of Incentives on Response Rates in Interviewer-Mediated Surveys,” Journal of Official Statistics, vol. 15, no. 2 (1999): 217–230.
7 For more information on the National Longitudinal Surveys (NLS), see .
8 Attendees were able to sign up for a meeting by checking a box on their registration forms. They could also sign up at the registration desk throughout the workshop. However, the main benefit—both to attendees and CE staff members—of advance registration was to allow the meetings coordinator time to find the most appropriate expert, and time for the expert to investigate the question or prepare other information (handouts, etc.) before the meeting to optimize the quality of the session.
9 For example, suppose the threshold for a particular income or expenditure is $100. On two records, the reported values exceed this: $200 on record A and $600 on record B. In this case, the value is topcoded to $400 (the average of $200 and $600) and the reported amounts are replaced with $400. An additional variable, called a “flag,” is coded to notify the data user that the $400 values are the result of topcoding, not actual reported values.
10 For details on topcoding and suppression, including specific variables affected and their critical values, see .
11 For example, suppose a respondent reports values for two sources of income: (1) wages and salaries and (2) pensions. Further suppose the following: The reported value for wages and salaries exceeds the critical value, and is therefore replaced by the topcoded value of $X; the reported value for pension income, $Y, is below the critical value for this income source; and the value for total income is shown to be $X + $Y + $Z. Because this respondent only has two sources of income reported and pension income is not topcoded, the reported value for wages and salaries is $X + $Z. To prevent this, total income must be computed after each individual component has been topcoded as needed. Therefore, in this example, total income is $X + $Y and the actual reported value of wages and salaries cannot be “reverse engineered.”
12 The project involved finding and merging results from the FMLI, MEMI, and MTBI files. The FMLI files include general characteristics of the consumer unit (e.g., region of residence, number of members, etc.) and summary variables (e.g., total educational expenditures). The MEMI files contain information on each individual member of the consumer unit (e.g., each member’s age, race, educational attainment, etc.). The MTBI files include expenditures for specific educational expenses (e.g., expenditures on “College tuition,” “Elementary and high school tuition,” “Test preparation, tutoring services,” “School books, supplies, equipment for vocational and technical schools,” etc.).
13 Because the practical training is progressive, until 2011 this activity was held on the second day to maximize overlap in attendance between newer and more experienced users. However, in response to comments from attendees at prior workshops, in 2012 the activity was scheduled for the first day of the workshop and successfully repeated in this order subsequently.
14 At the time of the workshop, weights for the first three states were available for 2016 and 2017; for the latter two, they were available only for 2017. As of this writing, weights for all five states are also available for 2018.
15 As noted in the introduction to the workshop, the Interview Survey collects data from respondents for four consecutive 3-month periods. During each interview, the respondent is asked to provide information on expenditures for various items during the previous 3 months. However, not all participants remain in the sample for all four of these interviews. Those who do remain have different characteristics (e.g., higher rates of homeownership and average age) than those who do not remain. Therefore, attempting to analyze average annual expenditures by only examining respondents who participate for all four interviews yields biased results.
16 Specifically, attendees learned how to access the EDA files to ascertain for what type of school or facility (college or university, elementary through high school, child daycare center, etc.) certain educational expenditures were incurred, and whether the expenditures were for a member of the consumer unit or a gift to someone outside of it.
17 For example, suppose the sample consists of two consumer units, one of which represents 10,000 consumer units in the population (i.e., itself and 9,999 others like it) and another that represents 20,000 consumer units in the population. If the first spent $150 and the second spent nothing (i.e., $0), the sample mean expenditure is $75. But the population-weighted mean is $50, or [($150 x 10,000)+($0 x 20,000)]/(10,000 + 20,000).
18 In the CE, the term “food at home” generally refers to the location of purchase, not place of consumption, of the food. That is, according to the CE glossary, “Food at home refers to the total expenditures for food at grocery stores (or other food stores)…” (https://www.bls.gov/cex/csxgloss.htm). Food purchased from restaurants, food trucks, vending machines, etc., is considered to be “food away from home,” even if it was taken home and eaten there. This includes foods that are delivered directly to the consumer: If purchased from restaurants, these items are food away from home; if purchased from grocery stores, these items are food at home.
19 Among these announcements, Mr. Henderson reported that this will be his last workshop as a BLS employee because he is retiring at the beginning of 2020. This announcement was particularly poignant for members of CE staff, both within BIA and the other branches, who have benefitted over the years from his wise counsel and guidance through various projects, including the workshop, which has been organized under his supervision since its inception in 2006.
20 At present, no consensus has emerged on a name for this group. The CE program has previously followed the nomenclature of the Pew Research Center, which officially defined this group as “post-Millennials” on March 1, 2018. For more information, see “Fun Facts about Millennials: comparing expenditure patterns from the latest through the Greatest generation,” Monthly Labor Review, March 2018, https://doi.org/10.21916/mlr.2018.9, esp. endnote 14; and a Pew Research Center report by Michael Dimock, “Defining generations: Where Millennials end and post-Millennials begin,” (http://www.pewresearch.org/fact-tank/2018/03/01/defining-generations-where-millennials-end-and-post-millennials-begin/).