Consumer Expenditure Surveys Methods Symposium and Microdata Users’ Workshop, July 12–15, 2016
The Consumer Expenditure Surveys (CE) program collects expenditures, demographics, and income data from families and households. To address CE-related topics in survey methods research, provide free training in the structure and uses of the CE microdata, and explore possibilities for collaboration, the CE program held its annual Survey Methods Symposium and Microdata User’s Workshop from July 12 to 15, 2016. Several economists from the CE program and other U.S. Bureau of Labor Statistics offices, along with research experts in a variety of fields, including academia, government, market research, and other private industry areas, gathered together to explore better ways to use the microdata.
The Consumer Expenditure Surveys (CE) program collects expenditures, demographics, and income data from families and households. To address CE-related topics in survey methods research, provide free training in the structure and uses of the CE microdata, and explore possibilities for collaboration, the CE program held its annual Surveys Methods Symposium and Microdata User’s Workshop from July 12 to 15, 2016. Several economists from the CE program and other U.S. Bureau of Labor Statistics offices, along with research experts in a variety of fields, including academia, government, market research, and other private industry areas, gathered together to explore better ways to use the microdata.
CE are the most detailed source of expenditures, demographics, and income that the federal government collects directly from families and households (or, more precisely, “consumer units”).1 In addition to publishing standard expenditure tables twice a year, the U.S. Bureau of Labor Statistics (BLS) CE program releases annual microdata on the CE website from its two component surveys (the quarterly Interview Survey and the Diary Survey), which researchers use in a variety of fields, including academia, government, market research, and other private industry areas.2
In July 2006, the CE program office conducted the first in a series of annual workshops (1) to help users better understand the structure of the CE microdata, (2) to provide training in the uses of the surveys and, (3) through presentations by current users and interactive forums, promote awareness of the different ways the data are used and thus provide opportunities to explore possibilities for collaboration.
Starting in 2012, the program office added an additional day to the event. This additional day allowed the symposium to explore topics in survey methods research in support of the CE Gemini Redesign Project (Gemini Project), a major project to redesign the CE (for more information, go to https://www.bls.gov/cex/geminiproject.htm).
In addition to the CE program staff, workshop speakers have included economists from BLS regional offices and researchers not affiliated with BLS. Similarly, symposium speakers have included CE program staff, other BLS National Office staff, and speakers from outside BLS. This article describes the 2016 Surveys Methods Symposium, conducted July 12, 2016, and the 2016 Microdata Users’ Workshop, conducted July 13–15, 2016.
Surveys methods symposium
In previous years, the CE methods symposiums focused on (1) providing an update on the status of the CE Gemini Project, including results from recent supporting studies, and (2) featuring research related to the redesign. However, with planning for a large-scale feasibility test of the redesign underway and having a number of design decisions still to be made, the CE symposium coordinator decided to change the format of the 2016 Symposium. To gain insight about a set of four select research topics—incentives, record use, online diaries, and individual (person-level) diaries—relevant to the ongoing redesign initiative, the CE program office invited representatives from other federal, international, and private sector surveys to share information about their existing methods and experiences as CE move toward a final redesign. In addition to learning about other experiences, the CE program staff hoped that the presentations would spur productive discussions of value and interest to all participants.
The day was divided into four equal sessions, each centered on one of the four research topics. In each session, a representative from the CE program opened with a presentation on the CE experience, focusing on not only the results of the research but also the goals to be reached related to the topic and the challenges encountered. The CE presentation was followed by two presentations, given by representatives from other surveys, on the basis of their existing methods or recently completed research relevant to the topic. At the end of each session, the CE representative moderated a discussion about the topic and the presentations, encouraging presenters and attendees to ask questions and comment.
This year, the symposium drew over 80 attendees from areas that included universities, academic programs in survey methodology, nonprofit organizations, private companies, medical-related establishments, and federal agencies. In the subsequent sections broken down by research topic, a review of the presentations included in each session is given, followed by a discussion of the key takeaways from all the sessions combined.
The first session was on incentives. The Gemini Project includes a combination of prepaid “token” cash incentives as part of the redesign plan.
Incentives for the Consumer Expenditure Survey: past, present, and future; Ian Elkin (CE). Mr. Elkin shared a selection of results from two previous incentive tests in the CE: a test in the quarterly Interview Survey conducted from 2005–07 and a test in the Diary Survey conducted in 2006. Both tests achieved increased respondent response. He described the upcoming Incentives Test in the Interview Survey, which was designed to test and assess alternative incentive structures and amounts as part of the Gemini Project. The project plan included developing a strategy for operationalizing and implementing incentives; researching and recommending incentive amounts; proposing incentive distribution procedures, including procedures to capture respondents who generally do not respond to classic incentives; and analyzing test data to determine which kind of incentive works best. Among the challenges regarding incentives that Mr. Elkin highlighted are the logistics that surround incentives—how to best coordinate the handling and distribution of incentives. He identified specific concerns, such as whether a “one-size-fits-all” approach for gaining respondent cooperation is an effective monetary incentive and whether a spending summary provided to the respondent is an effective nonmonetary incentive. He also identified some potential concerns that need to be addressed, including whether incentives harm data quality or increase nonresponse bias.
How much gets you how much? Monetary incentives and response rates in household surveys, Andrew Caporaso (Westat).3 Mr. Caporaso presented results from a meta-analysis of experimental literature on incentive use. After a thorough literature search, the research team members found over 200 reports on incentive effects, 40 of which met the criteria that they defined for a meta-analysis of incentives. The meta-analysis included 55 experiments summarized in these reports and 178 conditions tested across the 55 experiments between 1987 and 2011. The analysis found that response rates depended on mode (telephone, mail, or in person), incentive size, and timing (prepaid or promised). However, the findings varied considerably across the studies included in the analysis. Incentive timing mattered the most in telephone surveys, mattered the least for in-person surveys, and was inconclusive for mail surveys. In addition, no changes were observed on the effects of incentives over time (mitigated by declining response rates overall). In addition to presenting the findings from the meta-analysis, Mr. Caporaso summarized findings from other studies which showed that incentives work similarly in both panel surveys and cross-sectional surveys. Some evidence also indicated that incentives can help reduce other data collection costs (cost savings are greatest when these costs are high), thereby partially offsetting the cost of incentives.
The effect of large monetary incentives on survey completion: evidence from a randomized experiment with the Survey of Consumer Finances, Joanne W. Hsu (Federal Reserve Board). Dr. Hsu shared results from a randomized experiment in which the overall goal was to determine the optimal level of incentives for the Survey of Consumer Finances (SCF). The SCF is a triennial survey sponsored by the Federal Reserve Board that collects financial information of American families. The data are collected by NORC at the University of Chicago (formerly National Opinion Research Center). The experiment was conducted to inform the provision of incentives in the 2016 SCF, focusing on how the administration of the incentives and varying incentive levels affect response rates and interviewer effort. The results overall showed that both prepaid incentives and larger postpaid incentives increased response rates. Additionally, a prepaid incentive increased the respondent’s likelihood of completing the interview over the phone instead of through the interviewer visiting the respondent. This finding held true particularly at higher levels of postpaid incentives.
The second session focused on the use of records as a survey aid. The CE redesign plan calls for an in-person “records interview” with the respondent. The interview is intended to collect data on expenditures that a respondent would likely be able to find and more accurately report using a financial record.
Encouraging record use in the CE, Erica Yu (BLS). The CE program office recognizes that encouraging respondents to use records has many benefits, but they come with a number of challenges. Dr. Yu highlighted possible responses to these challenges, which included how to minimize burden, persuade respondents to collect and use them, address privacy concerns, and train interviewers who handle the records. Dr. Yu shared methods and findings of some records-use tests: one on Interview Protocol Testing and another on an Electronic Records Survey.
Financial record checking in surveys: do prompts improve data quality? Joe Murphy (Research Triangle Institute [RTI] International). Mr. Murphy shared findings from an experiment conducted on the Community Advantage Panel Survey (CAPS). Funded by the Ford Foundation funds, CAPS is a longitudinal study designed to assess the economic and social impacts of homeownership on low-to-moderate-income homeowners and renters and assess how homeowners and renters differ. The experiment was based on the premise that self-reports of financial information, such as income and assets, in surveys are particularly prone to inaccuracy and that data quality may be improved by respondents accessing financial records in reporting financial income. The experiment compared the results of two groups: one in which respondents were asked to prepare records for the interview and the other in which they were not asked to prepare records. The results showed that little difference was found in the response rate between the groups. The experiment also found that those encouraged to check records did so at a significantly higher rate than those not encouraged. One interesting finding was that between those respondents asked to check records and those not asked, survey estimates showed little evidence of differences. A prominent takeaway from the experiment is that without a more directed intervention, suggestive prompts to check financial records will not influence results.
Overall, RTI found a low rate of record use (less than 20 percent). However, if the respondent was asked to use records, the rate was higher than the rate if the respondent was not asked. RTI also found that those respondents asked to check records displayed fewer behaviors that might indicate suboptimal data quality (e.g., rounding).
The presentation provided a good methodological approach for evaluating quality resulting from record use in CE studies.
Providing time to find income and expenditure records—does it help? Some evidence from the National Household Food Acquisition and Purchase Survey (FoodAPS), John A. Kirlin (U.S. Department of Agriculture [USDA]). FoodAPS is a Diary Survey designed to collect data on all food acquisitions by household members over a 7-day period. The survey tested an income worksheet to improve accuracy of income data to be collected and to reduce respondent burden. Sixty percent of households completed the worksheet in whole or in part. The worksheet was not collected, but the respondent could refer to it during the interview. Dr. Kirlin looked at the differences on the basis of demographics.
According to the study findings, those who used the worksheet tended to be female, married, and not African American; resided in a nonmetropolitan county; and lived in 1- to 3-person households. One interesting observation from the experiment was that those who were more likely to fill out a worksheet might also have been more likely to give accurate and complete information. In addition, when “primary respondents” used the worksheet, they had fewer missing income values, were less likely to find reporting errors during the review, and were more likely to follow the direction to report gross rather than net earnings.
The USDA found that 60 percent of households completed or partially completed the worksheet. Use of the worksheet was associated with fewer missing items, fewer corrections during the interview, and fewer errors in reporting.
The third session discussed the use of online diaries. A major component of the CE redesign plan is the introduction of an online diary option for respondents to complete the diary-keeping task. This option is to be used instead of the current CE design of a paper-and-pencil diary option.
What’s new with the CE online diary? Brandon Kopp (BLS). Dr. Kopp shared the development progress of the CE online diary, highlighting a number of usability and feasibility tests that have been conducted on both the mobile and desktop versions of the diary. He also noted where the CE program office currently is in its timeline in preparing the online diary for the large-scale feasibility test of the CE redesign. He noted that further work is required because the design faces a number of challenges, which include data entry, navigation/login, respondent training, field-interviewer training, and field protocols. Dr. Kopp closed with some questions for discussion, such as, How is respondent engagement increased with the online diary?
The Nielsen eDiary: how a paper diary of radio listening became an online measurement instrument, Robin Gentry (Nielsen). Ms. Gentry shared information on the work that has been done to implement an online diary to replace a paper diary of radio listening, including lessons learned from the Nielsen 2007 eDiary Implementation and later work done on the New Radio eDiary. In 2007, respondents were offered a choice between completing a paper diary and an eDiary. Unfortunately, the result was a substantial decline in response and return rates. This decline was almost certainly driven by introducing a choice of response mode—this same phenomenon has been well documented in many other studies.4 Also, several design flaws and technical challenges were found, which might have also affected the response rate declines. Many of the respondents who chose the online option thought that they had successfully submitted an eDiary when, in fact, they had not. Registrants and nonregistrants of the eDiary received the same communication. Registrants felt harassed by reminders. The implementation was then followed by a multiyear pilot study. Currently, the eDiary is not in use for production. Even though the testing looked promising, the cost of implementation per completed eDiary was comparable to the traditional methodology, but response rates were considerably lower. In addition, getting participation from the whole household was difficult, and respondent feedback and behavior suggested that the paper diary was easier to complete.
Improving consumer payments measurement with the Diary of Consumer Payment Choice, Kevin Foster (Boston Fed). The Diary of Consumer Payment Choice (DCPC) is focused primarily on measuring payments, with special emphasis on payment instruments and cash holdings. The Survey of Consumer Payment Choice is a companion survey that must be taken before the diarist participates in the DCPC. The DCPC is an online diary, with a paper recall aid. Noting what information would be collected online, Mr. Foster provided a sample page from the DCPC that includes instructions for completing the form. He also noted that 40 percent of people getting paper diaries reported they carried them around with them, although these diaries were smaller than those used for the Diary Survey and looked pocket-size. The presentation discussed some comparisons of the collected data with other surveys and diaries and some challenges that the survey still faces.
The fourth and final session included presentations about the use of individual, or person-level, diaries. These diaries will also be a key component of the CE redesign in the place of the current household-level diary.
Gaining perspective through individual diaries, Brett McBride (BLS). Mr. McBride provided some background on the work that the CE program office has done pertaining to individual diaries, in particular the Individual Diaries Feasibility Test (IDFT) and the Proof of Concept (POC) Test. In the IDFT, online individual diaries with no incentives were tested. The test found no increased reporting with individual diaries compared with the CE current production’s household-level diary. In the POC Test, a mix of paper and online individual diaries was used and an incentive of $20 a diary was offered. In the POC Test, no increase in reporting was noted; this held when controlling for age, education, and number of household members. When the household was asked, “Would you prefer having a single diary for the whole household or having an individual diary for each member of the household,” 68.2 percent preferred a single diary, whereas only 30.2 percent preferred individual diaries. The remaining 1.6 percent responded “Don’t Know” or “Refused.” So, although the CE program observed benefits to using individual diaries (less proxy reporting, reduced recall bias, and limited burden on any one individual), households may not prefer them.
Examining response fatigue and noncooperation in the National Household FoodAPS, John A. Kirlin (USDA). Dr. Kirlin focused the presentation on nonreporting in the FoodAPS, specifically an individual member’s refusal to participate on a given day or for the entire week. After describing the survey structure for context, he shared detailed information on how often members refused to participate and the demographics of who refused. The results showed that members refused to participate only 3.3 percent of the days. When looking at demographics, he found that those who are most likely to refuse are older teens, adults, and seniors; those who are never married; African Americans; and nonrelatives and “other” relatives of the primary respondent. Dr. Kirlin also looked at variation of nonresponse over time and found some evidence of response fatigue. In looking ahead, Dr. Kirlin pointed out that providing extra training, reminders, or other inducements to those expected to be nonparticipants may be helpful.
The personal TV diary: moving from a household-level set-based diary to a personal-level multisource TV diary, Kate Williams (Nielsen). Nielsen tested respondents to determine if they would complete diaries for more than one media type: radio and television (TV). The purpose was to reduce overall recruitment costs and provide a single-source measurement across media types. In the dual-media diary test, households completed two media diaries approximately 1 month apart. The sample included two groups. One group received the person-level TV diary first and the person-level radio diary second. The other group received the person-level radio diary first and the person-level TV diary second. Sample test results showed that the person-level TV diary appeared to have promise, as did the methodology of sending two media diaries about 1 month apart. Some minor issues needed to be addressed, however. For example, the out-of-home viewing checkbox confused the respondents—some checked it every time they left home, not just when they viewed TV out of home and some respondents filled out the DVR section (time-shifting) even when they were watching live. Nielsen’s next step may be a larger scale testing of the person-level diary.
Summary of symposium
With many decisions still to be made for the large-scale feasibility test of the survey redesign planned for 2019, the CE program office was grateful to the external presenters who shared their experiences about some of the key topics that are being considered. The symposium served as a channel for discussing and exchanging ideas to help the CE program move closer to achieving its overall redesign goals. A selection of the CE key takeaways from those discussions is addressed as follows:
· From the session on incentives, an interesting takeaway was the strength of prepaid token incentives in eliciting cooperation in the SCF study. Learning how the token performs independent of a larger incentive in CE would also be interesting.
· Westat’s conclusion that incentives are effective in increasing response rates with in-person surveys, irrespective of timing (prepaid or promised), helps reaffirm the current CE plan of offering promised incentives as part of the redesign.
· NORC used incentives to bolster call-in rates. However, the CE program should check whether incentives recruit not only more respondents but also conscientious respondents.
· NORC also provided a number for respondents to call in to complete the interview and receive the incentive. Although relatively few sampled units called in, NORC estimated that having the option to call in reduced costs substantially because the cost to obtain an interview in the field is so high. This strategy is something that the CE program could investigate with any future incentives test. Although the CE are not designed to be completed by telephone, a respondent could call to schedule an interview instead of the current practice of field representatives repeatedly calling respondents.
· RTI’s study on records did not indicate that the CE program’s continued efforts to encourage respondent record use would harm response rates or quality, but no overwhelming evidence was found that showed record use increased data quality.
· Some positive findings from the FoodAPS study on records also validated CE research on the use of respondent worksheets.
· As noted, the DCPC found that a large percentage of paper-diary respondents reported carrying the diary with them. Perhaps the introduction of the CE mobile diary will have a similar effect.
· Nielsen has had many of the same struggles with online diaries as the CE program has had. Learning that Nielsen’s cost to complete an eDiary was comparable to the cost of a paper diary was surprising. The CE should consider looking closer at the costs for CE program online diaries.
· Although Nielsen attempts to collect email addresses from all household members, it only requires that one (main) respondent provide an email if other members are resistant. The CE program will consider this strategy while finalizing the online diary protocols.
Microdata users’ workshop
Day one: The first session of the 2016 workshop opened with presenters from the CE program. Taylor Wilson provided an overview of the CE, featuring topics such as how the data are collected and published. Scott Curtin then presented an introduction to the microdata, including an explanation of their features, such as data file structure and variable naming conventions.
The session concluded with Arcenis Rojas explaining the need to balance confidentiality concerns of respondents with usefulness of the data to researchers. Because U.S. law (Title 13) requires confidentiality of response, information that might potentially identify specific respondents must be removed from the data before they are released publicly. Some identifiers are direct, such as names and addresses. Others are not direct, such as extremely high expenditures, or even make and model of automobile(s) owned.
Arcenis Rojas explained methods to address these concerns. The first method, called “topcoding,” uses reported values for income or expenditures that exceed a certain threshold, called the “critical value.” These values are replaced by an average of all values exceeding this threshold and then “flagged” as topcoded (or “bottom-coded,” in the case of large income losses).5 He also explained recoding, in which data are either made less precise (e.g., if the owned automobile was produced in 1999, the year is replaced with the decade of manufacture, i.e., “1990s”) or changed in another way (state of residence is changed from Delaware to New Jersey) to preserve both comparability and confidentiality. He next explained suppression, in which reported values are removed from the dataset. In some cases, only specific information is suppressed on a record (e.g., details of a specialized mortgage). In other cases, the entire record is removed (e.g., report of a purchase of an airplane).6 Finally, he talked about methods to eliminate “reverse engineering,” a process through which the user could deduce protected information from other information provided in the publicly available files.7
Following a break, researchers not affiliated with the CE program and who have used the microdata for a variety of purposes then concluded the morning presentations. Nestor Rodriguez, the first speaker, described how CE data are used in an annual USDA report that estimates the cost of raising a child from birth through age 17. Dr. Li Miao, the second presenter, described work in its preliminary stages of exploring expenditures for travel (e.g., vacation) for single people compared with families. Of particular interest is that Dr. Miao and her coauthors are first-time users of the CE data and are working in consultation with a CE staff member (Geoffrey Paulin).
After the lunch break, Arcenis Rojas returned to explain the details of the structures of the microdata files (naming conventions, data included, file organization, etc.) for both the Interview and Diary Surveys. The rest of the afternoon was dedicated to practical training, in which attendees performed exercises using microdata.
The day concluded with an information-sharing group session among workshop participants and CE program staff. In this open forum, attendees met informally to discuss their research and suggested ways for improving the microdata.
Day two: The second day opened with more advanced topics. Brian Nix of the BLS Division of Price Statistical Methods (DPSM) presented technical details about sampling methods and construction of sample weights, and Susan King (DPSM) presented results of her research into producing experimental weights for estimating state-level expenditures with the use of the CE microdata.8 Completing the session, Meaghan Smith (CE program), spoke on imputation and allocation of expenditure data in the CE.
The remainder of the morning was allocated to research presentations. Reprising his presentation to the 2015 workshop, researcher Walter Lake described a user-friendly online tool that he has developed. The tool allows other researchers to obtain time-series estimates from microdata both for demographic groups and detailed expenditures not available in online formats available through the CE website. Mr. Lake was specially invited to return in 2016 for several reasons. For example, his tool was still in final testing stages during the 2015 workshop but was released for public use afterward. In addition, as noted, the tool is user-friendly, and it allows users to easily compute trends in detailed expenditure categories by specific demographic groups. Although the tool is not designed to produce complex analyses (such as regression analyses), it is helpful to researchers who either do not need to produce such analyses or want to perform preliminary investigation before pursuing more complex modeling.9
Li Zhang, the primary author of the second presentation, is well-known to regular workshop attendees, because he presented work at the 2014 and 2015 workshops. Unfortunately, he was unable to attend the 2016 workshop, but he worked with CE staff member Jimmy Choi who delivered the presentation. The work explores the relationship of nongambling expenditures to the consumers’ proximity to casinos. As an eligible researcher, the author was granted specially controlled access to the internal files of the CE and thus was able to use information about the location of the respondent (e.g., city of residence) to ascertain the proximity of casinos to them.10 He then analyzed how expenditures for several items (e.g., food at home, medical care, transportation, alcohol and tobacco, etc.) differ because of such proximity.
After a break for lunch, Brian Baker, technical writer–editor of the Monthly Labor Review (MLR), described the MLR publication process, from submission to posting, for authors interested in having their work appear in the MLR. Following Dr. Baker’s presentation, the technical instruction resumed with a topic of perennial interest to CE microdata users: the use of data from only respondents who complete all four interviews of the Interview Survey. As noted in the introduction to the workshop, the Interview Survey collects data from respondents for four consecutive calendar quarters. During each interview, the respondent is asked to provide information on expenditures for various items during the past 3 months. However, not all participants remain in the sample for all four of these interviews. Evan Hubener (CE program) explained that those who do remain have different characteristics (e.g., higher rates of homeownership and average age) than those who do not remain. Therefore, attempting to analyze average annual expenditures by only examining respondents who participate for all four interviews yields biased results. Following the Hubener presentation, Aaron Cobet (CE program) described the new (since 2013) methods in the CE for estimating income taxes paid by consumer units, which replaces amounts reported by consumers during their interviews, since these data have been found to be extremely unreliable.11 A break followed the conclusion of this presentation, after which practical training resumed for the rest of the afternoon.
Day three: The final day started with CE staff discussing advanced topics, such as Barbara Johnson-Cox explaining how sales taxes are applied to expenditure reports during the data production process. Next, a series of outside researchers spoke on other topics: a description of a regression tree method to estimate mean income from interest and dividends when the value of this (combined) source is missing (Wei-Yin Loh); how the National Endowment for the Arts (NEA) uses CE data to understand expenditure patterns related to the arts (e.g., fees and admissions, musical instruments, photographic equipment, reading, etc.), especially by demographic groups (Bonnie Nichols); and how one professor (Carolyn Carroll) uses the CE data to teach undergraduate students basic applied and practical statistical techniques.
In addition to a break, the morning included presentations by other CE staff. First, Steve Henderson delivered a “sneak peek” of developments for CE publications and microdata. For example, the CE program has posted new, experimental tables providing expenditures by generation of the reference person (millennial, Generation X, etc.) as a supplement to the standard age tables (under 25, 25 to 34, etc.) and will be publishing new income tables in which certain low-income groups are combined (e.g., less than $5,000 and $5,000 to $9,999 becomes less than $10,000) to make room for more detailed information on high-income consumers.12 In addition, of particular help to microdata users, one table includes detailed expenditures at the “all-consumer-units” level to more easily identify what items compose higher level expenditures. (For example, “food at home” includes such detailed items as lettuce, potatoes, round steak, etc.) He also noted the upcoming release (i.e., by nine Census divisions in addition to the Northeast, Midwest, South, and West regions) of more detailed geographic data for microdata users and asked for researcher help in assessing the impact of new rounding strategies that have been proposed to protect confidentiality. (For example, expenditures under $10 will be rounded to the nearest penny, whereas those between $10,000.00 and $99,999.99 will be rounded to the nearest $100.)
In the final presentation of the morning, Geoffrey Paulin described the correct use of imputed income data and sample weights in computing population estimates. He noted that the proper use of weights requires a special technique to account for sample design effects that, if not employed, result in estimates of variances and regression parameters that are incorrect.
After a lunch break, Aaron Cobet introduced new features of the substantially improved Public Use Microdata (PUMD) website. The updates greatly facilitate navigation of the site and include more documentation of certain topics than was available previously. Afterward, practical training continued in two parts: first, the completion of exercises in progress; second, a presentation of a computer program available with the microdata for SAS software users to easily compute correct standard errors for means and regression results when using (1) unweighted nonimputed data, (2) population-weighted nonimputed data, and (3) multiply imputed income data, both unweighted and population weighted (Paulin). Between training, attendees were debriefed in a feedback session designed to solicit opinion on how to improve future workshops.
Symposium and Workshop of 2017
The next Surveys Methods Symposium will be held July 18, 2017, once again in conjunction with the next microdata users’ workshop (July 19–21). Although the symposium and workshop will remain free of charge to all participants, advance registration is required. For more information about these and previous events, visit the CE website (https://www.bls.gov/cex/) and under the left navigation bar, titled “CE PUBLIC-USE MICRODATA,” look for “ANNUAL WORKSHOP.” For direct access to this information, the link is www.bls.gov/cex/csxannualworkshop.htm. Additional details about previous symposia are available at https://www.bls.gov/cex/geminimaterials.htm.
Highlights of workshop presentations
The following are highlights of the papers presented during the workshop, listed in the order of presentation. They are based on summaries written by the respective authors.
Nestor Rodriguez, USDA, “Expenditures on children by families” (Interview Survey), day one.
Since 1960, the USDA has provided estimates of expenditures on children from birth through age 17. For many years, these reports have used CE data to provide base estimates, which the Consumer Price Index (CPI) adjusts to account for changing prices in years between the periodic base-year updates. At present, the USDA staff is exploring ways to improve the methodology used to produce these reports, such as using a rolling series of the last 5 years as the “base” data, instead of updating older data with the CPI data. Before the workshop (April 2016), USDA staff, led by Mark Lino, invited CE experts Bill Passero and Geoffrey Paulin to participate in a roundtable discussion of the proposed changes to offer their comments. This presentation described some of the proposed changes.
Li Miao, Ph.D., Oklahoma State University, “Solo and family sojourns: a comparison of consumer travel expenditure” (Diary Survey), day one.13
Single people constitute a significant portion of travelers. However, the travel and hospitality industries generally focus on couples and families. For example, restaurants may offer discounts when people purchase more than one meal (buy one, get one 50 percent off) at more frequent rates than for single meals. Amenities at hotels, cruises, or other venues are often designed to entice couples or families more than single travelers. In this way, the industry may be missing significant opportunities. For example, singles are a growing segment of travelers. Because they have no children, they do not have to limit travel to “child friendly” excursions, and they can allocate their travel dollars in different ways. (For example, instead of taking one trip with four persons, they can take four trips.) Furthermore, they do not have to incorporate a spouse’s schedule into their planning and therefore may be freer to take trips, long or short, at different intervals than if they were married. Supporting this, the literature shows that single travelers have relatively high expenditures on trips. The CE provide data to explore these questions. For example, the Interview Survey includes detailed information on type of travel (e.g., airfares, ship fares, gasoline on out-of-town trips, etc.), lodging expenses, activities on trips (e.g., fees and admissions, such as sporting events on out-of-town trips), and purchases of alcoholic beverages on trips. Although the Diary Survey does not collect information related to travel, it has detailed information on food away from home, including types of meals (breakfast, lunch, dinner, and snack) and venue of purchase (full service, fast food, etc.) in addition to alcoholic beverages away from home by type (beer and ale, wine, and other). This work will study expenditure patterns of singles, couples, and families to further investigate their patterns of spending for food away from home and on vacations.
Walter Lake, Senior Associate, Research Financial Security and Mobility, Pew Charitable Trusts, “Increasing the usability of the BLS CE PUMD using Stata” (Interview Survey), day two.
The BLS CE PUMD are a very rich multifaceted set of data with a wealth of information only surpassed by the complexity of the procedures necessary to extract that information. The technical knowledge required to assemble the data before analysis creates a barrier for all but the most advanced users of statistical software packages. Lowering the barriers to entry will increase the number of researchers from a variety of fields who can access and use the data. To facilitate this, I have created an add-on package for STATA statistical software that streamlines the process for data aggregation and variable creation. Using a graphical user interface (GUI) with drop-down menus and selection buttons, the user can assemble and analyze PUMD with just a few mouse clicks. The GUI allows the user to weight the variables, run crosstabs, and output basic graphs. Two versions of the algorithm that powers the GUI are available to accommodate different levels of statistical programing prowess. The STATA add-on is very functional and the application has been publicly released.
Li Zhang, Ph.D., IMPAQ International, Research Associate, “The effect of casinos on the nongambling economy: evidence from nationwide household spending data” (Interview Survey), day two.
According to one source, the average family spends $600 a year on gambling, 60 percent of which is spent at casinos. To put this in perspective, this amount is more than the average family spent in 2015 (according to CE tables) on either cereal and bakery products ($518) or drugs and medical supplies ($573). It is about half of their expenditures on dinners at restaurants ($1,235), and about one-third of their expenditures on apparel and services ($1,846). As with any expenditure, the more easily available, the more likely a family is to incur an expenditure. In addition, as with any expenditure, the more one spends on good X, the less one has to spend on good Y. Given this, the proximity of casinos is presumably an important predictor of spending on gambling. At the same time, the ramifications of these expenditures for individual households and community development are important. For example, what goods do families give up to spend more on gambling? Does the growth of casinos also spur other economic development (e.g., more restaurants or entertainment facilities nearby) or not? This study investigates these questions using multiple years (1996 to 2013) of data from the CE Interview Survey.
Wei-Yin Loh, Ph.D., Professor, University of Wisconsin—Madison, “Estimating mean interest and dividends from CE data” (Interview Survey), day three.
The goal of the work is to evaluate the performance of missing value imputation methods for estimation of mean interest and dividends. Existing methods such as MICE (multiple imputation by chained equations) and AMELIA are included, as well as new methods that are based on classification and regression trees and forests. The methods are applied to the 2013 CE data, and their performance in terms of bias, mean squared error, and computation time are compared with the use of computer simulation.
Bonnie Nichols, NEA, Operations Research Analyst, “Accessing and analyzing consumer data about arts and entertainment” (Interview Survey), day three.
Each year, the NEA distributes substantial grants. For example, in FY 2015, the NEA awarded 19 grants, totaling $300K for research grants. That figure is a bit higher for FY 2016. The CE data are useful to the NEA in this process in that they describe both the types of arts purchased by consumers and characteristics of the patrons. For example, in 2014, 35-to-44-year-olds spent the most on average—$1,033—for arts, sports, and entertainment admissions; this amount is well over 4 times more than that spent by either the oldest group ($215 for those 75 and older) or youngest group ($246 for those under 25).
Carolyn Carroll, Ph.D., Stat Tech Inc., Senior Statistician, “Who spends the most on meals? Undergraduate consumer behavior as reported by undergraduates” (Diary Survey), day three.
Undergraduate students majoring in business, economics, nursing, and a few other majors are often required to complete an undergraduate statistics course and, in some cases, learn to use tools to analyze the data. Much has been written about the under-35 population. From the point of view of someone who works with and teaches this age group, I could characterize them as the “show me” cohort, because they are the “I don’t believe you” (or anyone who is not a peer) cohort. Teaching undergraduate statistics opens up the possibility of “showing them” what they and cohorts do and, at the same time, teaching them about research methods. Students often underestimate the difficulties in working together and in designing, carrying out, and reporting on research. Students can successfully write hypotheses, design field work procedures, and collect data by using portions of the CE (e.g., expenditures for food for home consumption or outside consumption). Students can test their hypotheses and use their data and other publicly available data to reflect on their own and cohort behaviors (e.g., how much it really costs to buy coffee, lunch, or other meals away from home and compare “local” expenditures to national expenditures by demographic category). The CE help today’s young people understand more about their behavior; look at contributors to future debt (because of student loans), obesity, and perhaps even other topics of interest today; and at the same time understand research methods and the work of at least one of the federal statistical agencies—BLS.
Staff of the CE program
Choi, Jimmy. Economist, Branch of Information and Analysis (BIA); day two
Cobet, Aaron. Senior Economist, BIA; days two and three
Curtin, Scott. Supervisory Economist, Chief, Microdata Section, BIA; day one
Henderson, Steve. Supervisory Economist, Chief, BIA; days one and three
Hubener, Evan. Economist, BIA; day two
Johnson-Cox, Barbara. Economist, Branch of Production and Control (P&C); day three
Paulin, Geoffrey. Senior Economist, BIA; day three
Rojas, Arcenis. Economist, BIA; day one
Smith, Meaghan. Supervisory Economist, Chief, Phase 3 Section, P&C; day two
Wilson, Taylor. Economist, BIA; day one.
Other BLS Speakers
Baker, Brian. Technical Writer–Editor, Monthly Labor Review Branch; day two
Nix, Brian. Mathematical Statistician, Division of Price Statistical Methods (DPSM); day two
King, Susan. Mathematical Statistician, DPSM; day two
Carroll, Carolyn. “Who spends the most on meals? Undergraduate consumer behavior as reported by undergraduates” (Diary Survey), day three
Lake, Walter. “Increasing the usability of the BLS CE PUMD using Stata” (Interview Survey), day two
Loh, Wei-Yin. “Estimating mean interest and dividends from CE data” (Interview Survey), day three
Miao, Li. “Solo and family sojourns: a comparison of consumer travel expenditure” (Interview and Diary Surveys), day one
Nichols, Bonnie. “Accessing and analyzing consumer data about arts and entertainment” (Interview Survey), day three
Rodriguez, Nestor. “Expenditures on children by families” (Interview Survey); day one
Zhang, Li. “The effect of casinos on the nongambling economy: evidence from nationwide household spending data” (Interview Survey) (presented by Jimmy Choi, BLS), day two
Geoffrey Paulin and Nhien To, "Consumer Expenditure Surveys Methods Symposium and Microdata Users’ Workshop, July 12–15, 2016," Monthly Labor Review, U.S. Bureau of Labor Statistics, May 2017, https://doi.org/10.21916/mlr.2017.15.
1 Although a household refers to a physical dwelling, “consumer unit” refers to the people living therein. For example, two roommates sharing an apartment constitute one household. However, if they are financially independent, they each constitute separate consumer units within the household. Similarly, although families are related by blood, marriage, or legal arrangement, unmarried partners who live together and pool income to make joint expenditure decisions constitute one consumer unit within the household. For a complete definition, see the CE glossary at https://www.bls.gov/cex/csxgloss.htm.
2 The Quarterly Interview Survey is designed to collect data on expenditures for big-ticket items (e.g., major appliances, cars, and trucks) and recurring items (e.g., payments for rent, mortgage, or insurance). In the Interview Survey, participants are visited once every 3 months for four consecutive quarters. In the Diary Survey, on the other hand, participants record expenditures daily for 2 consecutive weeks. The survey is designed to collect expenditures for small-ticket and frequently purchased items, such as detailed types of food (e.g., white bread, ground beef, butter, lettuce). The CE microdata for both surveys may be downloaded on the CE website at https://www.bls.gov/cex/pumd_data.htm.
3 Andrew Mercer, Andrew Caporaso, David Cantor, and Reanne Townsend, “How much gets you how much? Monetary incentives and response rates in household surveys,” Public Opinion Quarterly, vol. 79, no. 1, 2015, pp. 105–129.
4 Rebecca L. Medway and Jenna Fulton, “When more gets you less: a meta-analysis of the effect of concurrent web options on mail survey response rates,” Public Opinion Quarterly, vol. 76, no. 4, 2012, pp. 733–746.
5 For example, suppose the threshold for a particular income or expenditure is $100. On two records, the reported values exceed this: $200 on record A and $600 on record B. In this case, the value is topcoded to $400 (the average of $200 and $600), and the reported amounts are replaced with $400. An additional variable, called a “flag,” is coded to notify the data user that the $400 values are the results of topcoding, not actual reported values.
6 For details on topcoding and suppression, including specific variables affected and their critical values, see “2015 topcoding and suppression,” August 30, 2016, https://www.bls.gov/cex/pumd/2015/topcoding_and_suppression.pdf. Additional information is also provided in the public-use microdata documentation for the year of interest. (See, for example, “2015 users’ documentation, Interview Survey, Public-Use Microdata (PUMD), consumer expenditure,” August 30, 2016, https://www.bls.gov/cex/2015/csxintvw.pdf.
7 For example, suppose a respondent reports values for two sources of income: (1) wages and salaries and (2) pensions. Suppose the following: the reported value for wages and salaries exceeds the critical value and is therefore replaced by the topcoded value of $X; the reported value for pension income, $Y, is below the critical value for this income source; and the value for total income is shown to be $X + $Y + $Z. Because this respondent only has two sources of income reported and pension income is not topcoded, it is easy to see that the reported value for wages and salaries is $X + $Z. To prevent this, one must compute the total income after each individual component has been topcoded as needed. Therefore, in this example, total income is $X + $Y and the actual reported value of wages and salaries cannot be “reverse engineered.”
8 The CE microdata include weights so that users can produce estimates of average expenditures per consumer unit at the national (U.S.) level, regional level (Northeast, Midwest, South and West), or aggregate expenditure estimates for these areas. (For example, according to the most recent results available at the time of the writing of this report, the average consumer unit spent $7,023 on food in 2015, which amounted to more than $900 billion for the nation as a whole; consumer units in the South accounted for the largest share of this expenditure, 35.8 percent, or more than $322 billion.) However, neither averages nor aggregate expenditures are accurately estimated at the state level using CE weights. The experimental weights are designed to provide estimates for New Jersey. If successful, the experiment can be expanded to other states, if data collected there are sufficient to compute accurate weights.
9 For an overview of the tool, called “Kiwi,” see https://www.bls.gov/cex/workshop/2016/microdata/walter-lake-kiwi.pdf. For access to Kiwi, see https://github.com/Kiwi-den-den/KIWI.
12 The experimental generational tables are located at https://www.bls.gov/ cex/csxresearchtables.htm#generational. The new income ranges were first published on August 30, 2016, at https://www.bls.gov/cex/2015/combined/income.pdf.
13 The results presented were on purchases of food away from home and are derived from the Diary Survey. However, the larger project will also use data from the Interview Survey when investigating expenditures related to travel.