An official website of the United States government
The Consumer Expenditure Survey implements a statistical disclosure limitation process known as “top-coding” in the public used microdata release to conceal sensitive and identifiable information in order to protect the households confidentiality. This process replaces, for example, the high (low) end households annual income by the average of all high (low) end households annual income in the microdata for public users. Top-coding can numerically affect the utility of the microdata, especially for analyses that are sensitive to the high (low) end of the distribution. For instance, parameter estimates and confidence intervals can both be biased by this process. In this study, we investigate the impact of top-coding on CE microdata utility for multiple regression and logistic regression models used to analyze the relationship between certain expenditures and household income after adjusting demographic characteristics. We employ a multiple integration approach to estimate the empirical cumulative distribution function (ECDF) and an Anderson–Darling distance (A-DD) measurement to investigate the effects of top-coding on the utility of the CE microdata. We then evaluate A-DD under the background of a two-stage economics model and explore a robust logistic regression method on the propensity of expenditure reporting to offset the impacts of top- coding.