# New Occupational Separations Methodology

## Summary

This document describes the methods used to produce the new metric for occupational separations. The method involves two completely separate calculations, first for calculating workers who leave an occupation and find employment in a different occupation (occupational transfers), and second for calculating workers who leave the labor force entirely (labor force exits). The two methods are described in detail below; some aspects of the methods are the same, but are repeated so that the descriptions are comprehensive for both methods. Both methods use CPS microdata files, and the narratives include the microdata variable names used in performing the calculations; CPS microdata files and data dictionaries are available from the Census Bureau website at https://thedataweb.rm.census.gov/ftp/cps_ftp.html.

## 1. Occupational Transfers

### Data Source

To estimate workers who leave an occupation to enter a different occupation, we first estimate historical transfers using data from the CPS Annual Social and Economic Supplement (sometimes known as the March supplement). Question 46 in the supplement asks respondents whether the longest job they held in the prior calendar year is the same as their current job. If it is not, question 47 asks them what that job was, including information about the occupation, industry, and class of worker. This data is coded into supplement variables that define the longest job in the previous year. From these variables, we use OCCUP, the detailed occupation; WEMOCG, the SOC major occupational group; WEMIND, the NAICS major industry group; and LJCW, the class of worker. Supplement question 41 asks respondents how many hours they usually worked per week in the prior calendar year. In the supplement variable HRCHECK, the interviewer codes the worker as part time (1-34 hours per week) or full time (35+ hours per week) based on the response to question 41.

We also use demographic data on workers from the supplement, specifically age (A_AGE), sex (A_SEX), race (PRDTRACE), ethnicity (PEHSPNON), citizenship (PRCITSHP), educational attainment (A_HGA), and current occupation (PEIOOCC for detailed occupation and A_DTOCC for major group). Finally, we note the year the data were collected.

### Historical Data Regression

We do not directly look at historical occupational transfers by occupation, because many occupations would have small and unreliable sample sizes. Instead, we use a regression to determine how different factors affect the likelihood to transfer.

The regression uses a probit model, with the specification as follows:

*Prob(transfer) = ƒ(age, sex, occupation, education, occupation*education, race, ethnicity, citizenship, full time status, class of worker, industry, year)*

The dependent variable is defined as a respondent employed in a different major occupational group than in the prior calendar year (WEMOCG not equal to A_DTOCC).

For the independent variables:

- Age is categorical variable that splits respondents into age cohorts: 16-17, 18-19, 5 year cohorts from 20-24 through 70-74, and 75 or older. Respondents are assigned to an age cohort based on the age they were in the previous calendar year, defined as A_AGE – 1.
- Sex is a categorical variable with two options, male and female. The respondent's current sex as defined by A_SEX is used.
- Occupation is a categorical variable that classifies respondents into one of 22 SOC major occupational groups, representing the group the respondent was employed in during the prior calendar year. This is taken from the WEMOCG variable.
- Education is a categorical variable for a respondent's highest level of educational attainment: less than high school; high school; some college, no degree; associate's degree; bachelor's degree; master's degree; and doctoral/first professional degree. The respondent's education level, as defined by A_HGA, is used. The occupation-education interaction is included because workers within an occupation have varying transfer patterns based on their highest level of educational attainment.
- Race is a categorical variable that classifies respondents as white only, black only, Asian only, or "other" (including American Indian or Alaskan Native only, Hawaiian/Pacific Islander only, and any combination of two or more races). The respondent's race, as defined by PRDTRACE, is used.
- Ethnicity is a categorical variable with two options, representing whether or not the respondent is Spanish, Hispanic, or Latino. The respondent's ethnicity, as defined by PEHSPNON, is used.
- Citizenship is a categorical variable with three options: native United States citizen, foreign-born naturalized U.S. citizen, and foreign-born non-U.S. citizen. The respondent's current citizenship, as defined by PRCITSHP, is used.
- Full time status is a categorical variable with two options: part-time worker or full-time worker. This is taken from the HRCHECK variable.
- Class of worker is a categorical variable with four options based on the ownership of the organization that employed the respondent during their longest job in the previous year: private; government (federal, state, or local); self-employed incorporated; or self-employed not incorporated. This is based on the LJCW variable.
- Industry is a categorical variable that classifies respondents into one of 13 major NAICS industry groups, representing the industry the respondent was employed in during the prior calendar year. This is taken from the WEMIND variable.
- Year is a categorical variable that identifies during what year the data were collected.

The universe is all respondents in the ASEC supplement who meet all of the following criteria:

- Employed in the previous year (WORKYN = 1) – we start with a pool of people who worked in the previous year, so that we can determine what percent of those people transferred occupations
- Have a valid occupation code in the current year (ADTOCC > 0) – we need to be able to determine whether they switched occupations

*or*

out of the labor force in the current year (ALFSR = 7) – we want to count these people as being employed in the previous year but not transferring occupations (these labor force exits will be counted in the other portion of the separations method)
- At least 16 years of age in the previous year (A_AGE > 16) – this is to match the definition of who is included in the National Employment Matrix (note that the age in the previous year is calculated as A_AGE – 1, so A_AGE – 1 >= 16 is equivalent to A_AGE > 16)
- Not an unpaid family worker in the previous year (CLWK < 4) – this is to match the definition of who is included in the National Employment Matrix
- Not in the armed forces in the previous year (WEMOCG < 23) – this is to match the definition of who is included in the National Employment Matrix

Data from 10 supplement years are used to improve sample sizes and mitigate cyclical issues.

The output of the regression is a series of coefficients for each of the independent variables.

### Projection

The coefficients of the historical data regression provide information on the probability that a worker with those characteristics will leave their current occupation for another occupation. To project the number of workers who are expected to transfer, we apply these coefficients to the current demographic structure of occupations. Current data on occupations comes from the monthly CPS data. All months from the current base year are used, along with all months from the previous year in order to boost sample sizes. The independent variables are taken directly from each monthly data respondent, for all respondents who are employed (ALFSR = 1 or 2). Note that variable names in the monthly data do not always match the variable names from the supplement; equivalent variables are identified and used.

The parameters for each respondent, plus the regression coefficients, generate a z-score for the probability that that worker will leave their current occupation for another occupation. The z-score is converted into a numeric 0-1 probability for each respondent. That probability is multiplied by the respondent weight (PWCMPWG) and then summed by occupation to generate a numeric value for the number of workers in that occupation projected to transfer. This is divided by total employment in the occupation to generate a rate of transfers for each occupation.

### Conversion to Projection for Detailed and Summary SOC Occupations

The model generates the percent of workers projected to transfer occupations over a nine-month period of time. This nine-month rate is divided by nine to get a monthly rate and then multiplied by 12 to get the annual rate provided in the data. The model rates are available for the 533 detailed occupations available in the CPS microdata. In cases where multiple SOC occupations are aggregated into one CPS occupation, the rate from the CPS occupation is assigned to all of the component SOC occupations. As a result, some detailed SOC occupations have identical rates.

Annual occupational transfer rates are applied to the average of base and projected employment for an occupation to develop a projection for the average annual number of occupational transfers over the projection period. For summary SOC occupations, the number of occupational transfers from component detailed occupations is summed and used to calculate an occupational transfer rate for the summary occupation.

## 2. Labor Force Exits

### Data Source

To estimate workers who leave the labor force entirely, we first estimate historical exits using data from the monthly CPS. Monthly CPS data includes respondents who are in the sample for consecutive months, on an in-for-4, out-for-8, in-for-4 month pattern. Individual respondents are matched by records with equivalent household ids (HRHHID and HRHHID2), person line number (PULINENO), sex (PESEX), race (PTDTRACE), and age (PEAGE). Once the universe of matched records is created, respondents are identified as either labor force leavers, if they were in the labor force (PEMLR 1-4) for each of the first four months of their rotation but out of the labor force (PEMLR 5-7) for each of the second four months, or labor force stayers if they were in the labor force for the entirety of both four-month periods. Respondents with all other combinations of labor force status are excluded from the data.

### Historical Data Regression

We do not look directly at historical labor force exits by occupation, because many occupations would have small and unreliable sample sizes. Instead, we use a regression to determine how different factors affect the likelihood to exit.

The regression uses a probit model, with the specification as follows:

*Prob(exit) = ƒ(age, sex, age*sex, occupation, education, occupation*education, race, ethnicity, citizenship, full time status, class of worker, industry, year)*

According to the rules stated above, the dependent variable is defined 1 if they left the labor force or 0 if they remained in the labor force.

For the independent variables:

- Age is a categorical variable that splits respondents into age cohorts, 16-17, 18-19, 5 year cohorts from 20-24 through 75-79, and 80 or older. Respondents are assigned to an age cohort based on the age they were in month 4, as reported in PEAGE.
- Sex is a categorical variable with two options, male and female. The respondent's sex in month 4, as defined by PESEX, is used.
The age-sex interaction is included because men and women have different aging patterns of leaving the labor force.
- Occupation is a categorical variable that classifies respondents into one of 22 SOC major occupational groups, representing the major group of the respondent during month 4. This is taken from the PRDTOCC1 variable.
- Education is a categorical variable for a respondent's highest level of educational attainment: less than high school; high school; some college, no degree; associate's degree; bachelor's degree; master's degree; and doctoral/first professional degree. The respondent's education level in month 4, as defined by PEEDUCA, is used. The occupation-education interaction is included because workers within an occupation have varying exit patterns based on their highest level of educational attainment.
- Race is a categorical variable that classifies respondents as white only, black only, Asian only, or "other" (including American Indian or Alaskan Native only, Hawaiian/Pacific Islander only, and any combination of two or more races). The respondent's race in month 4, as defined by PTDTRACE, is used.
- Ethnicity is a categorical variable with two options, representing whether or not the respondent is Spanish, Hispanic, or Latino. The respondent's ethnicity in month 4, as defined by PEHSPNON, is used.
- Citizenship is a categorical variable with three options: native United States citizen, foreign-born naturalized U.S. citizen, and foreign-born non-U.S. citizen. The respondent's citizenship in month 4, as defined by PRCITSHP, is used.
- Full time status is a categorical variable with two options: part time labor force or full time labor force. This is taken from the PRFTLF variable.
- Class of worker is a categorical variable with four options based on the ownership of the organization that employed the respondent in their primary job during month 4: private; government (federal, state, or local); self-employed incorporated; or self-employed not incorporated. This is based on the PEIO1COW variable.
- Industry is a categorical variable that classifies respondents into one of 13 major NAICS industry groups, representing the industry of the respondent's primary job during month 4. This is taken from the PRMJIND1 variable.
- Year is a categorical variable that identifies the year in which the first month data (5
^{th} overall) was collected during the respondent's second four-month rotation in the sample. The year is based on the HRYEAR4 variable and month is based on the HRMONTH variable.

The universe is all respondents in the monthly CPS data who meet all of the following criteria:

- Able to be matched between both four month periods.
- Have one of the labor force patterns defined above (in the labor force all eight months, or in the labor force each of the first four months and out of the labor force in each of the last four months)
- Have a valid occupation code in month 4 (PEIO1OCD not equal to -1)
- Not an unpaid family worker in month 4 (PEIO1COW not equal to 8)
- Not in the armed forces in month 4 (PEIO1OCD not equal to 9840)

Ten years of year-to-year matched data are used to improve sample sizes and mitigate cyclical factors.

The output of the regression is a series of coefficients for each of the independent variables.

### Projection

The coefficients of the historical data regression provide information on the probability that a worker with those characteristics will leave the labor force. To project the number of workers who are expected to leave, we apply these coefficients to the current demographic structure of occupations. Current data on occupations comes from the monthly CPS data. All months from the current base year are used, along with all months from the previous year in order to boost sample sizes. The independent variables are taken directly from the each monthly data respondent, for all respondents who are employed (PEMLR = 1 or 2).

The parameters for each respondent, plus the regression coefficients, generate a z-score for the probability that that worker will leave the labor force. The z-score is converted into a numeric 0-1 probability for each respondent. That probability is multiplied by the respondent weight (PWCMPWG) and then summed by occupation to generate a numeric value for the number of workers in that occupation projected to leave. This is divided by total employment in the occupation to generate a rate of leaving for each occupation.

### Conversion to Projection for Detailed and Summary SOC Occupations

The model generates the percent of workers projected to exit the labor force over a nine-month period of time. This nine-month rate is divided by nine to get a monthly rate and then multiplied by 12 to get the annual rate provided in the data. The model rates are available for the 533 detailed occupations available in the CPS microdata. In cases where multiple SOC occupations are aggregated into one CPS occupation, the rate from the CPS occupation is assigned to all of the component SOC occupations. As a result, some detailed SOC occupations have identical rates.

Annual labor force exit rates are applied to the average of base and projected employment for an occupation to develop a projection for the average annual number of labor force exits over the projection period. For summary SOC occupations, the number of labor force exits from component detailed occupations is summed and used to calculate a labor force exit rate for the summary occupation.

**Last Modified Date: **October 24, 2017