Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.


The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Augmented CPS Data on Industry and Occupation

Peter B. Meyer and Kendra Asher


The Current Population Survey (CPS) classifies the jobs of respondents into hundreds of detailed industry and occupation categories. The classification systems change periodically, creating breaks in time series. Standard concordances bridge the periods, but often leave empty cells or inaccurate sharp changes in time series. Standard concordances also usually hold the assumption that a certain period of time can be representative, on more aggregate levels, of various historical periods. For each employed CPS respondent classified under a previous classification method we apply prediction algorithms, principally random forests, to impute standardized industry, occupation, and related variables. The imputations use micro data about each individual and large training data sets about the population. In some of the training data sets, industry and occupation have been classified by specialists into two industry and occupation category systems – that is, they are dual-coded. We train a random forests classifier to handle the changes in classification between the 1990s and 2000s largely on the dual-coded data set and apply it to the full CPS and IPUMS-CPS to impute several variables including industry and occupation. For changes in classification when an industry or occupation splits, we train the algorithms on the observations with the newly classified industry or occupation split, to predict how the historical observations would have been classified. We generate an augmented CPS, with additional columns of standardized industry and occupation. Augmented data sets of this kind can serve research on many topics.