Measuring Impact of Top‐Coding on the Utility of Consumer Expenditure Microdata

Daniel K. Yang and Daniell Toth

Abstract

The Consumer Expenditure Survey implements a statistical disclosure limitation process known as top-coding in the public used microdata release to conceal sensitive and identifiable information in order to protect the households confidentiality. This process replaces, for example, the high (low) end households annual income by the average of all high (low) end households annual income in the microdata for public users. Top-coding can numerically affect the utility of the microdata, especially for analyses that are sensitive to the high (low) end of the distribution. For instance, parameter estimates and confidence intervals can both be biased by this process. In this study, we investigate the impact of top-coding on CE microdata utility for multiple regression models used to analyze the relationship between certain expenditures and household income after adjusting demographic characteristics. We conduct a bootstrap re-sampling study and implement a data utility measurement based on a modified form of Kullback-Liebler divergence to evaluate the effects of top-coding on the utility of the CE microdata.