Consumer Expenditure Surveys Methods Symposium and Microdata Users’ Workshop, July 18–21, 2017
The Consumer Expenditure Surveys (CE) program collects expenditures, demographics, and income data from families and households. To address CE-related topics in survey methods research, provide free training in the structure and uses of the CE microdata, and explore possibilities for collaboration, the CE program held its annual Survey Methods Symposium and Microdata Users’ Workshop from July 18 to 21, 2017. Several economists from the CE program, staff from other U.S. Bureau of Labor Statistics offices, and research experts in a variety of fields—including academia, government, market research, and other private industry areas—gathered together to explore better ways to collect CE data and to learn how to use the microdata once they are produced.
The Consumer Expenditure Surveys (CE) are the most detailed source of expenditures, demographics, and income that the federal government collects directly from families and households (or, more precisely, “consumer units”).1 In addition to publishing standard expenditure tables twice a year, the U.S. Bureau of Labor Statistics (BLS) CE program releases annual microdata on the CE website from its two component surveys (the Quarterly Interview Survey and the Diary Survey). Researchers use these data in a variety of fields, including academia, government, market research, and other private industry areas.2
In July 2006, the CE program office conducted the first in a series of annual workshops in order to achieve three goals: (1) to help users better understand the structure of the CE microdata; (2) to provide training in the uses of the surveys; and (3) to promote awareness, through presentations by current users and interactive forums, of the different ways the data are used, and thus provide opportunities to explore collaboration. In 2009, the workshop expanded from 2 days to 3 days to include presentations from data users not affiliated with BLS. This allowed users to showcase their experiences with the public use microdata (CE PUMD) files, to discuss problems and successes using the data, and to seek comment and guidance from CE program staff in completing their work.
Starting in 2012, the program office preceded the workshop with an additional 1-day symposium to explore topics in survey methods research in support of the CE Gemini Redesign Project (Gemini Project), a major initiative to redesign the CE (https://www.bls.gov/cex/geminiproject.htm).
In addition to the CE program staff, workshop speakers have included economists from BLS regional offices and researchers not affiliated with BLS. Similarly, symposium speakers have included CE program staff, other BLS National Office staff, and speakers from outside BLS. This article describes the 2017 Survey Methods Symposium, conducted July 18, 2017, and the 2017 Microdata Users’ Workshop, conducted July 19–21, 2017.
Survey methods symposium
The 2017 Symposium presentations focused on four research topics that are key features of the ongoing Gemini redesign initiative, following a similar format to that used in the 2016 Symposium. The four research topics were incentives, record use, online and personal diaries, and data quality. The CE program office invited representatives from other federal, international, and private-sector surveys to share information about their existing methods and experiences on these research topics. The goals of the symposium were (1) to share CE research findings with stakeholders, survey researchers, and data users and (2) to promote a discussion about common challenges and solutions related to CE and other surveys as we try to produce high-quality data in a time of declining response rates, changing respondent behavior, and rising costs of data collection.
The day was divided into four sessions, each centered on one of the four research topics. In each session, a representative from the CE program opened with a presentation on the CE experience, focusing on not only the results of the research, but also the goals to be reached related to the topic and the challenges encountered. The CE presentation was followed by short presentations, many of which were given by representatives from other surveys on their existing methods or recently completed research relevant to the topic. At the end of each session, the CE representative moderated a discussion about the topic and the presentations, encouraging presenters and attendees to ask questions and provide comments.
This year, the symposium drew 68 attendees from areas including universities, academic programs in survey methodology, nonprofit organizations, private companies, medical-related establishments, and federal agencies. In the following research topic sections, a review of the presentations is given, followed by a discussion of the combined key takeaways.
The first session was on incentives. The CE redesign plan includes a combination of prepaid and promised conditional monetary incentives. The CE has been testing the efficacy of incentives as part of the redesign effort.
Incentives in the CE interview survey: present findings and future research, Ian Elkin (BLS). Mr. Elkin presented the results from the Incentives Field Test, which was carried out from July 2016 through December 2016 to assess the impact of different combinations of prepaid and conditional incentives. Respondents in the control group received no incentives, and there were three treatment groups: one group with a $5 prepaid incentive, a $40 incentive for participation, and an additional $20 incentive paid to the respondent if the respondent uses records while taking the survey; a second group with the prepaid incentive and $40 for participation, but no records incentives; and a third group with both the $40 participation and $20 records incentive, but no prepaid incentive. The prepaid incentive was mailed by First-Class Mail along with the advance letter. The conditional incentives were in the form of debit cards. Results indicate that incentives did impact response rates, and there were only slight differences between the groups that received incentives. The group that received all three incentives showed a 4.3 percent gain in response rates and the group that did not incentivize record use showed a 5.7 percent gain. Some improvements in data quality were seen, from factors such as an increase in record use, longer interview length, reduced doorstep concerns, and reduced refusal conversions. The redesign plan projected that incentives would significantly reduce the number of contact attempts to achieve participation, making the incentives for CE cost neutral. However, the test results found that the reduction in contact attempts was small. In addition, several challenges were encountered during the incentives test, including respondents throwing away the advance letter and therefore not receiving the incentives, and problems with cashing debit cards.
Examining the impacts of prepaid and promised incentives with the National Health Interview Survey (NHIS), Adena Galinsky (National Center for Health Statistics). Ms. Galinsky presented results from the NHIS Incentive Experiment conducted in 2015. The experiment consisted of three treatment groups with different levels and types of monetary incentives, plus a control group. The first group received a $5 bill in the advance letter (prepaid incentive), the second group received up to $40 in debit cards mailed after the interview (conditional incentive), and the third group received both the prepaid and conditional incentives (both incentives). When comparing response rates (using AAPOR definition RR23), the three treatment groups were neither significantly different from each other nor from the control group. However, the two groups that received the promised incentive had higher completion rates (using AAPOR definition RR14) than the $5 group—and the group that received both incentives ($45) also had a higher completion rate than the control group. When contacted for survey participation, a higher percentage of the $5 group voiced “doorstep concerns” about privacy, while a lower percentage of the $45 group voiced concerns, compared with the control group. The interview pace was slower for respondents in the promised incentive groups, as compared with the control group. While some positive results were found among the promised incentive groups, and particularly the one that also got the prepaid incentive, the effect sizes were small. NHIS has not implemented either of the incentive structures tested in the experiment.
The second session focused on the use of records as a survey aid. The CE redesign plan calls for an in-person “records interview” with the respondent. The interview is intended to collect data on expenditures that a respondent would likely be able to find and more accurately report using either paper or electronic records. The CE requires granular detail for many expenses and therefore primary records (such as bills and receipts) are important because they contain cost information and itemized breakdowns for items that are expensed together, whereas secondary records (such as bank statements and credit card bills) may not have the required level of detail.
Finding the value of electronic records in the CE, Erica Yu (BLS). Dr. Yu presented results from a small-scale cognitive study on instructing respondents to collect electronic records and discussed ongoing challenges with record use in the CE. The presentation highlighted the reluctance of participants to collect primary records before the interview. The main recommendation was that the type of record (primary vs. secondary) to request should be guided by indications from the respondent about how much energy they are willing to put into the record-collection task. Secondary records may be just as useful for their ability to cue memories and avoid unreported expenditures. Encouraging secondary electronic records collection and the use of paper records for respondents who are not proficient with electronic records is recommended. Another finding from this study was that instructing participants on how to navigate their internet browsers to download and collect electronic records was very difficult because of different instructions being needed for each specific browser. A final recommendation from this study was to shift focus from providing technical instructions to motivating respondents to prepare and organize files by explaining why records are important for the survey. An additional challenge with record use is that respondents might focus exclusively on the expenses that are represented by the records they bring to the interview and not put in the cognitive effort to recall and report on the other expenditures.
The impact of record use in the CE Interview Survey, Taylor Wilson (BLS). Mr. Wilson presented results of his research on records use and rounding in the CE Interview Survey. His presentation focused on measuring the extent of rounding in expenditure data to see if record use leads to less rounding. Less rounding would indicate better data quality. For example, heaped values occur when we observe many observations at certain values in the expenditure data. These values have the highest probability of being rounded values. To measure rounding, rather than just looking for values divisible by 10, he developed an “average fall” method that identifies heaped values as those expenditure frequency values that are more than two standard deviations from the average fall in a given distribution. His analysis focused on expenditures involving record use, with the assumption that those using records were reporting the actual and not the rounded amount, which can be seen when distributions are compared with respondents not using records. The major finding was that there was a large effect of record use on rounding for clothes and accessories, while there was no effect on subscriptions. This is what the CE program would expect from the way prices are distributed for these categories.
Record use in the Medical Expenditure Panel Survey (MEPS), Jeffrey Rhoades (Agency for Healthcare Research and Quality/CFACT/DSO). MEPS is a large panel survey that focuses on health care utilization and expenditure. One respondent proxy reports for the household, and there are five interview waves. The respondent reports on the use of healthcare services for the household, and MEPS combines the data with additional information from medical providers and other sources. Measurement error has been a concern for MEPS due to proxy reporting and a long recall/reporting period between waves. Furthermore, survey questions involve terminology that may not be familiar to all respondents. Record use is encouraged, as it can help improve data quality.
Dr. Rhoades presented some facts about MEPS and discussed a 2014 data quality intervention that involved training, monitoring interviewers, and improving respondent materials. This intervention led to increased use of records, specifically key records like bills and receipts.
Approaches for improving record use include training respondents and interviewers, monitoring interviews, and interviewers providing weekly updates and feedback. MEPS has moved to regular in-person training of interviewers every 3 years as a sign of the importance it places on this component. MEPS provides a “Tip Sheet” for respondents, informing them of what medical categories to collect records for and how to prepare records. MEPS identifies key records (primary records) vs. other (secondary records) and emphasizes the importance of using key records. MEPS interviewers are trained to tailor the approach of record collection to each household. They will reschedule an interview if they determine that the most knowledgeable household member would be willing to participate at a later time. MEPS also provides an example expenditure worksheet that respondents can fill out.
The third session focused on the use of online diaries. A major component of the CE redesign plan is the introduction of an online diary option for respondents to complete the diary-keeping task. This option is an alternative to the existing CE paper-and-pencil diary. The redesign also calls for the use of “personal” diaries given to each household member age 15 and older, rather than the current “household” diary in which one respondent fills in the diary for the household.
Factors that affect reporting in online and personal diaries, Brett McBride (BLS). Mr. McBride’s presentation summarized recent research into the Proof of Concept (POC) test’s fielding of personal online diaries. The research found that most eligible members of the 520 households completing the POC made at least some entries into their diaries. Households where everyone participated did not have more entries than those where some participated, nor did personal diaries appear to increase the number of entries over comparable CE survey production diaries. Attendance at interviewer diary instruction sessions was associated with significant increases in expenditures reported in the diaries. Online diaries were provided to households with more educated and younger members. Although overall completion rates were slightly lower for online diaries compared with paper diaries, examination of alcohol purchases suggests that online diaries may have been used to capture more “socially undesirable” expenses.
Online diaries for everyone: data quality, device usage, and compliance with personal expenditure diaries, Douglas Williams (Westat). Mr. Williams presented the online diary developed by Westat for the CE and the main results of the small usability test with a recruited sample conducted as part of the Online Diary Improvement Project. Results indicate better cooperation from the main household diarist (household respondent), but lower cooperation from other household members. Timeliness of expenditure entries, measured as time between when the expenditure occurred and when it was recorded, was shorter for the main household diarist than for other household members. Midweek reminder calls were beneficial to data quality, as they led to a spike in reporting on the fourth day of the diary week. Respondents’ choice of mobile vs. desktop or laptop modes may have depended on how many items they bought, how many stores they visited on a shopping trip, or how much information was available on store receipts.
Respondents reported some challenges with the online diary. Interviewers offered to add “short links” to all available devices for respondents to have easy access to the online diaries. Despite that, password requirements were challenging for many respondents and some had difficulties with establishing a password satisfying security requirements. Respondents using the mobile version of the diary found it to be more convenient, but others avoided the mobile version because of perceptions that the small screen and lack of a tactile keyboard were inconvenient. Recruitment and training of other household members needs to be addressed in future tests.
Use of GPS devices to enhance travel behavior diaries, Josh DeLaRosa (Abt Associates). Travel diaries surveys usually involve collecting information from all household members on the address, time, purpose, and mode of travel. Researchers provide respondents a GPS logger or smartphone app that passively collects GPS information on location at defined time intervals, and this can be combined with travel diary data. Alternatively, the diary and GPS can be integrated together to prompt recall. This presentation compared results from studies that used these approaches and discussed some limitations of current GPS technology.
GPS data enhancements can be used to mitigate item nonresponse and measurement error. The largest benefit is to pinpoint underreporting, as the GPS loggers can detect locations the respondent might have overlooked. In one of the studies discussed in the presentation, GPS loggers were used to detect spatial and temporal data, and machine learning was used to detect stops. These stops prefilled into the travel diary, after which the respondent verified or edited the information for accuracy. One of the barriers to using GPS technology is the high costs. These include the costs of developing the application; privacy concerns, which lead to lower recruitment rates; costs of shipping loggers to respondents that may not have a smartphone; and costs for respondents to acquire 3G service. Rural and certain urban areas may have poor service that could lead to missing information.
The final session focused on data quality. Improving data quality is the main goal of the CE redesign, and as part of the effort to measure data quality, CE has been developing a Data Quality Profile and measures of respondent burden to enable the survey to monitor data quality over time.
Developing a data quality profile for the Consumer Expenditure Survey, Yezzi Angi Lee (BLS). Ms. Lee’s presentation provided an overview of the Data Quality Profile at CE. CE has been developing metrics to monitor survey data quality beyond the Total Survey Error (TSE) components, to include dimensions such as timeliness, accessibility, and interpretability. These are part of the Data Quality Profile, which CE envisions as the primary reporting format in production, to serve as an integrated single source of information on the quality of CE data for internal CE staff. A subset of that information will also be released to external users of CE data, to provide access to information on data quality.
The first iteration of the Data Quality Profile focused on metrics like response and nonresponse rates, expenditure edit rates, and income imputation rates, which measure measurement error and nonresponse error. The second iteration of the Data Quality Profile (prototype 2) scaled up from the first version as the team updated metrics and added new metrics, such as the use of records by survey mode.
Evaluating respondents’ burden via indirect indicators of data quality: item vs. index scores, Daniel K. Yang (BLS). In the CE, respondents answer a series of questions about respondent burden that can be combined into an index to track respondent burden over time. The primary objective of Mr. Yang’s presentation was to compare the performance of a single burden item with a composite burden index, as a data quality measure for the CE. Indirect indicators of data quality associated with a single burden item were compared with those indicators for the composite burden index using burden questions from the 2012 and 2013 CE Research Sections.
The research demonstrated that the correlations between indirect indicators of data quality and the single burden item are not different from the correlations between indirect indicators of data quality and overall burden index scores. Therefore, a single burden item can be used as an indicator of a respondent’s perceived burden and a composite index is not required for monitoring respondent burden over time.
Reducing respondent contact burden in the ACS using a cumulative burden score, Robert Ashmead (U.S. Census Bureau). Dr. Ashmead presented results from research conducted at the Census Bureau on reducing contact burden in the American Community Survey (ACS). The ACS uses a multimode data collection strategy in which nonrespondents can be contacted by mail, by telephone, and in person. The concept of respondent contact burden measures respondent burden from multiple possible contact attempts. This presentation focused on how to implement stopping rules based on a cumulative burden score. Each contact attempt is assigned a score related to its perceived burden to the respondent and based on the type and result of the attempt. For each case, the cumulative burden score is tracked, and the case is stopped (no further contact attempts made) when the cumulative score reaches a stopping threshold. The threshold for stopping used in the 2015 ACS pilot was based on the 95th percentile of historical data, and it resulted in approximately 4 percent of the sample being stopped and a 1 percent decrease in response rate. Dr. Ashmead also discussed how this approach is being implemented in the ACS.
Summary of symposium
With many decisions still to be made for the large-scale feasibility test of the survey redesign planned for late 2019, the CE program office is grateful to the external presenters who shared their experiences with some of the key topics that are being considered. The symposium served as a channel for discussing and exchanging ideas to help the CE program move closer to achieving its overall redesign goals. A selection of the CE key takeaways from those discussions is addressed as follows:
- An interesting takeaway from the session on incentives was that neither study found incentives effective in reducing the number of contact attempts needed to secure cooperation. Incentives were found to increase response rates in the CE incentives test but not in the NHIS study. The NHIS study found that some respondents who only received a $5 token incentive were offended by being offered such a small amount. In the CE incentives test, we did not include a group that only received the $5 prepaid incentive and did not receive any negative feedback on the token incentive as part of an incentive package. This provides some evidence that CE should continue to test the use of token incentives for the CE redesign.
- While there was clear evidence of record use improving data quality as indicated by the reduction in rounding of expenditures in the CE, record use presents many challenges for respondents and interviewers. The presentation on recent initiatives in the MEPS emphasized the importance of interviewer training, development of respondent materials, and adopting a tailored approach to the individual household. These are important elements for CE to emphasize in the redesign protocols and materials for collecting records-based data.
- The presentations on online diaries highlighted some of the benefits and challenges of adopting new technologies. The online diary developed by Westat performed very well in usability testing. However, there are several logistical issues that need to be addressed while implementing this in the CE. Many respondents reported difficulties with password requirements and had difficulties logging in. Also, participation by household members other than the main diary keeper was limited if they were not present when the diary was placed in the household.
- The Abt Associates presentation on GPS devices highlighted the importance of GPS and other technological innovations that could enhance the usability and data quality from mobile diaries in the future. For instance, CE could explore the use of GPS to get location information to supplement CE data collection, validate the data, or collect outlet information.
- Two of the data-quality presentations emphasized the measurement of respondent burden. The CE has added questions on respondent burden, and the CE presentation emphasized that a single burden item will suffice to track respondent burden over time, to the degree that it correlates with data quality. The Census presentation highlighted an innovative measure called the cumulative burden score that is being used to tailor stopping rules for data collection in the ACS. This is relevant to the CE redesign as it explores different strategies for reducing respondent burden, such as reducing the length and level of detail in the interview and reducing the number of interviews and contact attempts.
Microdata users’ workshop
Meet with an expert: Held in 2017, the 12th annual workshop included an innovation called the “Meet with an expert” program. The purpose was to provide an opportunity for attendees to have in-depth, one-on-one meetings with members of the CE staff, wherein the attendees could ask questions and receive comments or other guidance about the projects in which they were engaged. While this opportunity has been provided informally at past workshops, this year the program was formally announced via email and web posting. In addition, attendees were able to sign up for a meeting by checking a box on their registration forms. They could also sign up at the registration desk throughout the workshop. The main benefit—both to attendees and CE staff members—of advance registration was to allow the coordinator time to find the most appropriate expert, and time for the expert to investigate the question or prepare other information (handouts, etc.) before the meeting to optimize the quality of the session.
Based on comments from participants, the program was a great success. Therefore, it will be repeated in future workshops.
Day one: The first session of the 2017 workshop opened with presenters from the CE program. Program Manager Adam Safir provided an overview of the CE, featuring topics including how the data are collected and published. Economist Jimmy Choi then presented an introduction to the microdata, including how they can be used in research and the types of documentation about them available to users. Economist Taylor Wilson completed the session with a description of data file structure and variable naming conventions.
After a break, attendees received their first practical training with the data. In this session, they learned basic data manipulation, including how to compute means from the microdata and how to integrate results from the Interview and Diary surveys. They also learned about a topic of perennial interest to CE microdata users: caveats concerning the use of data only from respondents who complete all four interviews of the Interview Survey.5 This session started in the morning and ran through the early afternoon, with a lunch break in between.
Following the training session, researchers not affiliated with the CE program completed the afternoon activities. Dr. Catherine Curtis, the first speaker, described work in its preliminary stages on exploring patterns of expenditures for travel (e.g., vacation) for families from 2005 through 2015. Of particular interest was that Dr. Curtis was a first-time user of the CE microdata and was working on this project in consultation with a CE staff member (Geoffrey Paulin). Her coauthor, Dr. Li Miao (Oklahoma State University), presented at the 2016 workshop.
The next speaker was Ryan Pfirrmann-Powell of the U.S. Department of Agriculture Farm Service Agency. Mr. Pfirrmann-Powell, formerly an economist with the CE program, spoke about the estimation of elasticities of demand for fresh, fluid milk using data from the Diary Survey.6
Following a break, Heather Lamoureux, a Senior Research Analyst at Clarity Services, presented work coauthored with Rick Hackett, also of Clarity Services, on a proposed model of borrowers’ “ability to pay” for certain short-term or small-dollar loans, in compliance with a new regulation proposed by the Consumer Financial Protection Bureau.
The final speaker of the first afternoon was Dr. Sita Slavov, who presented research exploring whether Social Security benefit and tax changes in the early 1980s affected saving and life insurance holdings of people born in 1938 or later. Part of the reform increased the retirement age for members of this group.
Following Dr. Slavov’s presentation, the afternoon concluded with an informal gathering of attendees immediately outside the workshop room. The purpose of this event was to provide networking opportunities for attendees—both to meet each other and to initiate or renew contacts with staff of the CE program.7
Day two: The second day opened with more advanced topics. First, Senior Economist Aaron Cobet (CE program) explained the need to balance confidentiality concerns of respondents with usefulness of the data to researchers. Because Title 13, U.S. Code requires confidentiality of response, information that might potentially identify specific respondents must be removed from the data before they are released publicly. Some identifiers are direct, such as names and addresses. Others are not direct, such as extremely high expenditures or make and model of automobile(s) owned.
Mr. Cobet explained methods used in the production of the CE microdata files to address these concerns. The first method, called “topcoding” involves reported values for income or expenditures that exceed a certain threshold, called the “critical value.” These values are replaced by an average of all values exceeding this threshold and then “flagged” as topcoded (or “bottom-coded,” in the case of large income losses).8 He also explained recoding, in which data are either made less precise (e.g., if the owned automobile was produced in 1999, the year is replaced with the decade of manufacture, i.e., “1990s”) or changed in another way (e.g., state of residence is changed from Delaware to New Jersey) to preserve both comparability and confidentiality. Mr. Cobet next explained suppression, in which reported values are removed from the data set. In some cases, only specific information is suppressed on a record (e.g., details of a specialized mortgage). In other cases, the entire record is removed (e.g., report of a purchase of an airplane).9 Finally, Mr. Cobet talked about methods to eliminate “reverse engineering,” a process through which the user could deduce protected information from other information provided in the publicly available files.10
Next, statistician Brian Nix of the BLS Division of Price Statistical Methods (DPSM) presented technical details about sampling methods and construction of sample weights, and statistician Susan King (DPSM) presented results of her research into producing experimental weights for estimating state-level expenditures with the use of the CE microdata.11
The rest of the morning was allocated to research presentations and practical training. In the first research presentation, Dr. Michael Conte and his coauthor Keith Meyers (both from RegionOneSource) described a user-friendly online tool that they have developed. The tool allows other researchers to obtain information about the data in a given year or over several years from one source. For example, if a user is looking for the component expenditures in a larger expenditure category, or how the category composition has changed, or even the names of variables associated with a particular expenditure, the tool allows the researcher to find this information without having to consult and search multiple PDF files on the CE PUMD website.
Next, Megan Sweitzer (USDA) described her research comparing food expenditures from CE for a variety of items to those collected in scanner data.
Completing the session, the practical training demonstrated how to find certain nonexpenditure information from detailed PUMD files.12
After a break for lunch, Terry Schau, managing editor of the Monthly Labor Review (MLR), described the MLR publication process, from submission to posting, for authors interested in having their work appear in the MLR. Following this presentation, Economist Arcenis Rojas (CE program) demonstrated an interactive visualization tool he developed to allow data users an easy way to explore microdata. For example, by selecting from a short list of demographic characteristics (e.g., region), the tool produces graphs demonstrating average annual expenditures for preselected items, such as housing and food at home, for the characteristics selected. The means are displayed for each selected group in bar charts to allow for easy comparison across the groups.
In the next presentation, Senior Economist Geoffrey Paulin (CE program) described the correct use of sample weights in computing population estimates. He noted that the proper use of weights requires a special technique to account for sample design effects that, if not employed, results in estimates of variances and regression parameters that are incorrect. This led into a practical training session devoted to computing weighted results.
The afternoon concluded with two research presentations. The first presenter was Louis Poirier (Bank of Canada), who spoke of his research regarding expenditure changes in response to the “oil shock” of
The second presenter was Ph.D. candidate Dmitri Koustas (University of California-Berkeley), who examined the relationship of consumption inequality and frequency of purchase. The work finds that inequality may be rising because of decreased frequency of purchase. That is, even if two groups spend the same over the course of a certain period, if one group “stocks up” (i.e., purchases larger quantities in fewer trips) more than the other, there may appear to be an inequality because the stocking up occurred outside the scope of time the survey covers.
Day three: The final day started with CE staff discussing advanced topics. First, Economist Barbara Johnson-Cox explained how sales taxes are applied to expenditure reports during the data production process. Then, Economist Clayton Knappenberger spoke on imputation and allocation of expenditure data in the CE.
Next, a panel of three outside researchers, moderated by Dr. Paulin, addressed research related to the Supplemental Nutrition Assistance Program (SNAP), formerly known as the Food Stamps Program. The first panelist, Lisa Boily (BLS New York-New Jersey Information Office), described research in which she and her coauthors examined changes in characteristics and spending patterns of food stamps and SNAP participants over a 10-year period (2006 through 2015) that was specifically selected to include the 2007–09 recession. The second panelist, Dr. Jiyoon Kim (Indiana-Purdue University, Fort Wayne), examined how changes in SNAP benefits affected spending for items other than food. The third panelist, Ph.D. student Madeleine L’Esperance (University of Wisconsin, Madison), described how she used the data in a graduate class assignment in which students were instructed to replicate a published empirical economic study on a topic of their interest. In addition to describing the research findings, the panelists discussed their experiences using the data—what was most useful or most limiting about the data,
The final presentation of the morning was delivered by Jonathan Peters (College of Staten Island/CUNY Graduate School), who examined expenditures on tolls and similar highway-access fees by income group. His presentation was followed by a lunch break, after which the remaining presentations originally scheduled for the morning were delivered.
The first of these was a “sneak peek” of developments for CE publications and microdata. CE Information and Analysis Branch Chief Steve Henderson noted that starting August 29, 2017, the CE program would promote to standard production a previously experimental table showing expenditures by generation of the reference person (e.g., Millennial, Generation X, etc.) to supplement the standard age tables (e.g., under 25, 25 to 34, etc.). Later, new experimental income tables would be posted, in which cross-tabulated data are available for higher income groups than are currently available in standard published tables.13 In addition, of particular interest to microdata users, he noted the upcoming release of more detailed geographic data (i.e., by nine Census divisions in addition to the Northeast, Midwest, South, and West regions). Mr. Henderson asked for researcher help in assessing the impact of new rounding strategies that have been proposed to protect confidentiality. (For example, expenditures under $10 will be rounded to the nearest penny and those between $10,000.00 and $99,999.99 will be rounded to the nearest $100.)
Continuing the “sneak peek” theme, Dr. Paulin described work in progress within the CE program to impute data for assets and liabilities when receipt, but not values, was provided for various items. This led into a practical training session in which he described the correct methods for analyzing the multiply imputed income data.
The final part of this training session was devoted to the computation of calendar year population expenditure estimates. These computations require use of weights described earlier in the workshop. The training was followed by a forum in which attendees were debriefed to solicit their opinions on how to improve future workshops. The day, and the workshop, concluded with a final special topics training session. This included meetings with experts and a description of a computer program available with the microdata for SAS software users. This program will help CE microdata users to compute correct standard errors for means and regression results easily when using (1) unweighted nonimputed data, (2) population-weighted nonimputed data, and (3) multiply imputed income data, both unweighted and population weighted.
Symposium and Workshop of 2018
The next Survey Methods Symposium will be held July 17, 2018, in conjunction with the 13th annual Microdata Users’ Workshop (July 18–20). Although the symposium and workshop will remain free of charge to all participants, advance registration is required. For more information about these and previous events, visit the CE website (https://www.bls.gov/cex/) and under the left navigation bar, titled “CE PUBLIC USE MICRODATA,” look for “ANNUAL WORKSHOP.” For direct access to this information, the link is https://www.bls.gov/cex/csxannualworkshop.htm. Links to the agendas for the 2017 workshop (https://www.bls.gov/cex/ce-2017-workshop-agenda.pdf) and the 2017 symposium (https://www.bls.gov/cex/ce_2017_symposium_agenda.pdf) are also available on this webpage. Both agendas include links to presentations delivered at the respective events.
Highlights of workshop presentations
The following are highlights of the papers presented during the workshop, listed in the order of presentation. They are based on summaries written by the respective authors.
Catherine Curtis, Ph.D., Oklahoma State University, “Family travel expenditures, 2005–15: patterns in consumer family travel” (Interview Survey), day one.
This work in progress examines family travel expenditures in the 2005–15 period. It reveals patterns of behavior in spending, destinations, and length of stay in a period of time that started in a recession and ends in the subsequent recovery period. It is expected that family travel spending did not cease in the recession period, but effects such as destination choice, method of transportation, and length of stay will be examined in the recession and recovery periods. Because certain crucial variables, such as destination, are not publicly available, completion of this work will require coauthorship with BLS staff.
Ryan Pfirrmann-Powell, U.S. Department of Agriculture Farm Service Agency, “Estimating elasticities of demand from the Consumer Expenditure Diary Survey” (Diary Survey), day one.
Models describing agricultural commodity markets rely heavily on the relative abundance of delayed, supply-side data. This presentation details the process I used to estimate how demand for a product—in this case fresh, fluid milk—responds to various economic and demographic factors. I then consider how linear modeling of households’ decisions to consume milk might improve our understanding of commodity markets. The emphasis will be on the process used to construct a model from Consumer Expenditure Diary Survey data, as well as federal price and household alimentary survey data for preliminary analysis, and a discussion of the challenges, limitations, and appropriate interpretation of estimates derived from the complex household sample design.
Heather Lamoureux, Clarity Services, “Ability-to-Pay: using CE microdata to proxy borrower expenses” (Interview Survey), day one.
NonPrime101, the research arm of Clarity Services Inc., modeled an Ability-to-Pay (ATP) process with actual administrative borrower data. The purpose was to find a cost-effective, nonprohibitive, automated way to model this requirement. Additionally, once we ran the model on actual deidentified borrower data, including the expenses needed to do a full ATP model, we determined the effect of the proposed regulations on the industry as a whole. We applied the Consumer Financial Protection Bureau’s (CFPB’s) proposed methodology to compute residual income after payment of debt service obligations and used the income remaining to cover a new loan payment and pay basic living expenses (as defined by the CFPB proposal). Where the consumer reports included debt payments for shelter (a mortgage payment) or an auto loan, we used those values. In all other cases, we proxied expenses based on data from the BLS CE microdata and the U.S. Census Bureau, both sources endorsed in CFPB’s proposal. The microdata is segmented based on income and age of the consumer, and we used those segmentations. The microdata is also based on a “consumer unit” of multiple income earners in a household. Borrower income reported in the Clarity system we used is individual income data. Accordingly, expense data was prorated based on the number of income earners in a “consumer unit” in the relevant segment in the microdata.
Sita Slavov, Ph.D., George Mason University, “Social Security and saving: an update” (Interview Survey), day one.
Typical neoclassical life-cycle models predict that Social Security has a large and negative effect on private savings. Theory also suggests that the Social Security dependents’ benefits paid to children of deceased workers crowd out private life insurance holdings. We use CE data to investigate the impact of two policy changes from the early 1980s on both private saving and life insurance holdings. The first policy change we examine is the 1983 Social Security reform. This reform increased payroll taxes for self-employed individuals relative to wage earners. It also increased the full retirement age (a change that is equivalent to a benefit cut) for individuals born in 1938 and later. Using a difference-in-difference approach, we examine the impact of these changes on the savings of self-employed individuals relative to wage earners and individuals born in 1938 and later relative to individuals born before 1938. The second policy change we examine is a 1981 reform that reduced dependents’ benefits paid to children. Using a difference-in-difference approach again, we examine the impact of this policy change on the life insurance holdings of households with children living at home versus other households. Our preliminary results use the NBER extracts of the CE microdata created by Harris and Sabelhaus (2000), which merge the quarterly interviews for each family and aggregate spending, income, and wealth variables into broad categories that are consistent across years. We find weak evidence that the payroll tax increase may have reduced saving among the self-employed. However, we find no evidence that the increase in the full retirement age or the cut in dependents’ benefits reduced saving or life insurance holdings.
Michael Conte, Ph.D., and Keith Myers, RegionalOneSource (ROS), “Consumer Expenditure Microdata user support website” (Interview Survey), day two.
In this presentation, Conte and Myers provided an introduction to the website they developed that provides on-the-fly interactive documentation of the Consumer Expenditure Survey microdata.
At its most rudimentary level, the website allows users to search for variables using a strict or broad search. The strict search finds variables by name, using autocomplete as an intermediary function in the search process. The broad search allows users to search for keywords that appear in the variable description field, and returns all variable names that are associated with the search term.
Having identified a variable of interest, users can then pursue many types of information about their highlighted variable, including variable description, tables that provide data for their chosen variable, and various types of metadata about their chosen variable.
One of the premier features of the website is that it provides a unique insight into parent-child relationships in the microdata. Having identified a particular variable using the strict or broad search algorithm, users can then query the website to provide a listing of the ancestral and child lineage of their selected variables. So, for example, after identifying variable XYZ in the data, a follow-on query shows that XYZ is a child of variable ABC and a parent of variables DEF, GHI, and JKL, which in turn are parents of numerous other stipulated variables. This information can be of value to users of the microdata for obvious reasons and is a useful extension of the metadata provided in BLS’s official microdata documentation.
Users can also search for table names and, having found a table of interest, can list numerous features of the table including whether the table contains information from the interview vs. diary, and a list of variables located in the table organized by groupings (e.g., consumer unit characteristics, income, expenditures, etc.). The website also provides users with the ability to output the results of their searches to Excel tables for use outside the application, as well as other abilities that would be explained during the presentation. RegionalOneSource intends to provide free and unlimited access to this website to all interested users.
Megan Sweitzer, U.S. Department of Agriculture Economic Research Service, “Comparing food-at-home expenditures: commercial scanner data and government survey data” (Diary Survey), day two.
USDA uses commercial food scanner data in economic research, but little documentation is available for these data sets of consumer food expenditures. Therefore, we compared the IRI Consumer Network household scanner data with nationally representative surveys of food expenditures—the CE Diary Survey and the USDA’s Food Acquisition and Purchase Survey (FoodAPS)—to better understand the coverage and representativeness of the scanner data. We categorized foods from each survey to align with CE food categories and estimated total and mean weekly expenditures in 18 food-at-home categories over 5 years (2008–12). We also compared expenditures by category for a number of demographic subpopulations. The results show how CE food expenditures compared with household scanner data and with the USDA’s new FoodAPS survey, which collected comprehensive household food purchase data using a combination of scanners and diaries.
Louis Poirier, the Bank of Canada, “Analysis of the impact of lower oil prices on American household consumption” (Interview Survey), day two.
The impact of the oil price shocks on the U.S. economy is a topic of considerable debate among economists. In this paper, we examine the response of U.S. consumers to the 2014–15 negative oil price shock using representative survey data from the CE. We propose a difference-in-difference identification strategy based on a plausibly exogenous factor, motor vehicle ownership, which generates variation in exposure to the shock across consumers. Based on this, we explore whether highly exposed consumers increased consumption or increased savings in response to the shock. Preliminary evidence suggests that consumers significantly increased consumption of both oil and non-oil-related goods, suggesting that the U.S. marginal propensity to consume (MPC) out of oil price savings is high. The influence of other factors, such as mortgage status and household indebtedness, is also explored.
Dmitri Koustas, Ph.D. candidate, UC-Berkeley, “Consumption inequality and the frequency of purchases” (Interview and Diary Surveys), day two.
Many researchers have used the CE in an attempt to document consumption inequality. We argue that this approach is potentially complicated by changes in shopping patterns. Combining the CE, time use, and AC Nielsen data, we document that spending inequality and consumption inequality have departed from each other in recent years. Our results suggest that almost all the rise in measured inequality in the CE can be explained by changes in shopping patterns.
The paper makes two contributions that will be useful for future researchers. We construct a bridge between the CE Interview Survey/Diary Survey to the AC Nielsen data. We also document that changes in the Diary Survey methodology implemented in 2004 resulted in a large reduction in the standard deviation of spending in the diary data.
Lisa Boily, BLS New York–New Jersey Information Office, “Using food stamp identifiers in the CE Diary Survey: opportunities and challenges” (Diary Survey), day three.
This research explores the change in the demographic mix of Food Stamp beneficiaries through the Great Recession and then examines food expenditure patterns to determine the relative importance of select food categories to both the general consumer and food stamp beneficiaries.
Jiyoon (June) Kim, Ph.D., Indiana-Purdue University, Fort Wayne, “Changes to low income households’ spending patterns in response to the 2013 SNAP benefit cut” (Diary Survey), day three.
The American Recovery and Reinvestment Act (ARRA) increased Supplemental Nutrition Assistance Program (SNAP) benefits significantly in April 2009 in response to the Great Recession. The higher benefit levels were expected to remain through 2014, but congressional action resulted in an early expiration on November 1, 2013. The purpose of this project is to examine the extent to which this SNAP benefit cut affected food expenditure as well as nonfood expenditure of SNAP participants, using CE 2012 to 2014. We made use of the panel structure of CE to investigate the change in expenditures of the same households over the course of quarterly interview surveys–-those who were interviewed both before and after November 2013.
Madelaine L’Esperance, Ph.D. candidate, University of Wisconsin-Madison, “Replicating results from published work: an example based on expenditure response to in-kind transfers: evidence from the Supplemental Nutrition Assistance Program” (Diary Survey), day three.
Over the last three decades, replication has increasingly been recognized as an important priority in the economics field. Replication has been especially encouraged among graduate students as a means to apply empirical methods and critique existing studies. As part of my graduate coursework, I replicated a recent study by Beatty and Tuttle (2016) that explored the labeling effect of SNAP benefits on food-at-home expenditure using the 2007–10 Family Interview Survey of the Consumer Expenditure Survey. The presentation features a review of the published paper, replication results, and strengths and weaknesses of the Consumer Expenditure Survey for this project.
Jonathan Peters, Ph.D., The College of Staten Island/The CUNY Graduate School, “Income issues in road user and transportation fees—Just who is paying for what?” (Interview Survey), day three.
There has been an explosive growth in various types of new transportation fees over the last 15 years. With growing structural deficits in state Departments of Transportation, there has been a rapid deployment of new and proposed road use charges to fill these funding gaps. Further, transportation users are moving away from private automobile ownership to greater utilization of technology-enabled transportation options (mass transit, transportation network companies [Uber, Lyft, and such], and car sharing). All of these new services will be reflected in household consumption expenditures and are altering the consumption basket of households. This project looks to explore the relative burden of road tolls and transportation fees (parking, taxis, rideshare, and motor fuel) as a component of household expenditures and compare the consumption patterns that are reported in the CE with other sources of toll burden and transportation fees by income class. The authors have data from agency user surveys and transport system usage that was collected from 2004 to 2017. These sources will be compared with CE data to examine how the reporting and measurement of toll burden and transport fees by income class have changed over the last two decades.
Staff of the CE program
Choi, Jimmy. Economist, Branch of Information and Analysis (BIA); day one
Cobet, Aaron. Senior Economist, BIA; day two
Curtin, Scott. Supervisory Economist, Chief, Microdata Section, BIA; emcee and practical training sessions; days one, two, and three
Henderson, Steve. Supervisory Economist, Chief, BIA; days one and three
Johnson-Cox, Barbara. Economist, Branch of Production and Control (P&C); day three
Knappenberger, Clayton. Economist, P&C; day three
Paulin, Geoffrey. Senior Economist, BIA; days two and three
Rojas, Arcenis. Economist, BIA; days one and two
Safir, Adam. Chief, Division of Consumer Expenditure Surveys; day one
Wilson, Taylor. Economist, BIA; day one
Other BLS speakers
Boily, Lisa. BLS New York–New Jersey Information Office, “Using food stamp identifiers in the CE Diary Survey: opportunities and challenges” (Diary Survey); day three
King, Susan. Mathematical Statistician, Division of Price Statistical Methods (DPSM); day three
Nix, Brian. Mathematical Statistician, DPSM; day three
Schau, Terry. Managing Editor, Monthly Labor Review; day two
Catherine Curtis, Ph.D., Oklahoma State University, “Family travel expenditures, 2005–15: patterns in consumer family travel” (Interview Survey); day one
Michael Conte, Ph.D., and Keith Myers, RegionalOneSource (ROS), “Consumer Expenditure Microdata user support website” (Interview Survey); day two
Jiyoon (June) Kim, Ph.D., Indiana-Purdue University, Fort Wayne, “Changes to low income households’ spending patterns in response to the 2013 SNAP benefit cut” (Diary Survey); day three
Dmitri Koustas, Ph.D. candidate, UC-Berkeley, “Consumption inequality and the frequency of purchases” (Interview and Diary Surveys); day two
Heather Lamoureux, Clarity Services, “Ability-to-pay: using CE microdata to proxy borrower expenses” (Interview Survey); day one
Madelaine L’Esperance, Ph.D. candidate, University of Wisconsin-Madison, “Replicating results from published work: an example based on expenditure response to in-kind transfers: evidence from the Supplemental Nutrition Assistance Program” (Diary Survey); day three
Jonathan Peters, Ph.D., The College of Staten Island/The CUNY Graduate School, “Income issues in road user and transportation fees—Just who is paying for what?” (Interview Survey); day three
Ryan Pfirrmann-Powell, U.S. Department of Agriculture Farm Service Agency, “Estimating elasticities of demand from the Consumer Expenditure Diary Survey” (Diary Survey); day one
Louis Poirier, the Bank of Canada, “Analysis of the impact of lower oil prices on American household consumption” (Interview Survey); day two
Sita Slavov, Ph.D., George Mason University, “Social Security and saving: an update” (Interview Survey); day one
Megan Sweitzer, U.S. Department of Agriculture Economic Research Service, “Comparing food-at-home expenditures: commercial scanner data and government survey data” (Diary Survey); day two
Geoffrey D. Paulin and Parvati Krishnamurty, "Consumer Expenditure Surveys Methods Symposium and Microdata Users’ Workshop, July 18–21, 2017," Monthly Labor Review, U.S. Bureau of Labor Statistics, June 2018, https://doi.org/10.21916/mlr.2018.15.
1 Although a household refers to a physical dwelling, “consumer unit” refers to the people living therein. For example, two roommates sharing an apartment constitute one household. However, if they are financially independent, they each constitute separate consumer units within the household. Similarly, although families are related by blood, marriage, or legal arrangement, unmarried partners who live together and pool income to make joint expenditure decisions constitute one consumer unit within the household. For a complete definition, see the CE glossary at https://www.bls.gov/cex/csxgloss.htm.
2 The Quarterly Interview Survey is designed to collect data on expenditures for big-ticket items (e.g., major appliances, cars, and trucks) and recurring items (e.g., payments for rent, mortgage, or insurance). In the Interview Survey, participants are visited once every 3 months for four consecutive quarters. In the Diary Survey, on the other hand, participants record expenditures daily for 2 consecutive weeks. The survey is designed to collect expenditures for small-ticket and frequently purchased items, such as detailed types of food (e.g., white bread, ground beef, butter, lettuce). The CE microdata for both surveys may be downloaded from the CE website at https://www.bls.gov/cex/pumd_data.htm.
3 The American Association for Public Opinion Research. 2016. Standard Definitions: Final Dispositions of Case Codes and Outcome Rates for Surveys. 9th edition. AAPOR.
5 As noted in the introduction to the workshop, the Interview Survey collects data from respondents for four consecutive calendar quarters. During each interview, the respondent is asked to provide information on expenditures for various items during the previous three months. However, not all participants remain in the sample for all four of these interviews. Those who do remain have different characteristics (e.g., higher rates of homeownership and average age) than those who do not remain. Therefore, attempting to analyze average annual expenditures by only examining respondents who participate for all four interviews yields biased results.
6 Elasticity is a concept in economics that measures the “sensitivity” of one factor to the change in another through the ratio of percent changes. For example, the price-elasticity of the demand for milk measures the percent change in quantity of milk purchased given a 1 percent change (increase or decrease) in the price of milk.
7 Because the practical training is progressive, until 2011 this activity was held on the second day to maximize overlap in attendance between newer and more experienced users. However, in response to comments from attendees at prior workshops, in 2012 the activity was scheduled for the first day of the workshop and successfully repeated in this order subsequently.
8 For example, suppose the threshold for a particular income or expenditure is $100. On two records, the reported values exceed this: $200 on record A and $600 on record B. In this case, the value is topcoded to $400 (the average of $200 and $600), and the reported amounts are replaced with $400. An additional variable, called a “flag,” is coded to notify the data user that the $400 values are the results of topcoding, not actual reported values.
9 For details on topcoding and suppression, including specific variables affected and their critical values, see “2016 Topcoding and Suppression,” August 29, 2017, https://www.bls.gov/cex/pumd/2016/topcoding_and_suppression.pdf. Additional information is also provided in the public-use microdata documentation for the year of interest. (See, for example, “2016 Users’ documentation, Interview Survey, Public-Use Microdata (PUMD), Consumer Expenditure,” August 29, 2017.)
10 For example, suppose a respondent reports values for two sources of income: (1) wages and salaries and (2) pensions. Suppose the following: The reported value for wages and salaries exceeds the critical value, and is therefore replaced by the topcoded value of $X; the reported value for pension income, $Y, is below the critical value for this income source; and the value for total income is shown to be $X + $Y + $Z. Because this respondent only has two sources of income reported and pension income is not topcoded, the reported value for wages and salaries is $X + $Z. To prevent this, one must compute the total income after each individual component has been topcoded as needed. Therefore, in this example, total income is $X + $Y and the actual reported value of wages and salaries cannot be “reverse engineered.”
11 The CE microdata include weights so that users can produce estimates of average expenditures per consumer unit at the national level, regional level (Northeast, Midwest, South, and West), or aggregate expenditure estimates for these areas. (For example, according to the most recent results available at the time of the writing of this report, the average consumer unit spent $7,023 on food in 2016, which amounted to more than $932 billion for the nation as a whole. Consumer units in the South accounted for the largest share of this expenditure, 35.9 percent, or more than $334 billion.) However, neither averages nor aggregate expenditures are accurately estimated at the state level using CE weights. The experimental weights are designed to provide estimates for New Jersey, and are available for 2016 PUMD files. (For these weights and related documentation, see https://www.bls.gov/cex/pumd_data.htm.) If successful, the experiment can be expanded to other states, if data collected there are sufficient to compute accurate weights. At present, possible weights for Florida and California are being studied.
12 Specifically, attendees learned how to access the EDA files to ascertain for what type of school or facility (college or university, elementary through high school, child day care center, etc.) certain educational expenditures were incurred, and whether the expenditures were for a member of the consumer unit or a gift to someone outside.
13 The standard generation tables were posted at https://www.bls.gov/cex/tables.htm on August 29, 2017. The earlier experimental generational tables are located at https://www.bls.gov/cex/csxresearchtables.htm#generational. The experimental cross-tabulated income ranges are expected to be available at https://www.bls.gov/cex/csxresearchtables.htm by mid-2018.