Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Article
February 2025

Supplementing state employment records with demographic data

Administrative records maintained by state governments to administer unemployment insurance (UI) programs provide a valuable resource for economic and policy research. These records contain rich longitudinal, worker-level information but lack data on demographic characteristics of workers, including sex, age, and racial and ethnic identity. This article describes the development of a new state-level data resource based on UI data that links employment records to demographic identifiers compiled from other state agency records. We describe the assembly process and show how the resulting data compare to American Community Survey (ACS) estimates. We are able to replicate the overall population of workers to between 1 percent and 4 percent of the ACS population count. Relative to ACS data, these merged administrative data capture more workers with annual earnings below $10,000 and fewer workers with earnings above $10,000. Our method of augmenting employment records with demographic indicators yields a workforce composition with similar age, gender, and ethnoracial (race/ethnicity) characteristics as observed in U.S. Census Bureau ACS records. We conclude with implications for researchers hoping to use similar state administrative data sources.

Administrative records maintained by state governments to administer unemployment insurance (UI) programs provide a valuable resource for economic and policy research.1 These records contain rich longitudinal, worker-level information on employment histories and earnings alongside employer-level information on sector and firm size. However, a key limitation of UI employment records is that they lack data on demographic characteristics of workers, including sex, age, and racial and ethnic identity. Such individual demographic information is important for understanding both the overall functioning of the economy and possible heterogeneity in labor market experiences and the impact of policies.2

This article describes the development of a new state-level data resource based on UI data that links employment records to demographic identifiers from other state agencies. This work is similar to extant state and federal integrated data efforts, but it uses novel state-level data sources and methods.3 We show how such demographic data can be assembled, merged, and benchmarked to estimates from the American Community Survey (ACS) conducted by the U.S. Census Bureau. We find that we can replicate the overall population of workers to within 1 to 4 percent of a comparable population in the ACS, depending on the year. Relative to ACS data, these merged administrative records capture more workers with annual earnings below $10,000 and fewer workers with earnings above $10,000. Our methods of augmenting employment records with demographic indicators yield a workforce composition with similar age, gender, and ethnoracial characteristics as observed in ACS estimates.4 We conclude with implications for researchers hoping to use similar state administrative records.

Background: labor market data sources

Surveys commonly provide data for research and evaluation by linking demographic characteristics to labor market outcomes. Researchers have relied on survey datasets, such as the Current Population Survey (CPS), the Survey of Income and Program Participation, the Panel Study of Income Dynamics (PSID), and the National Longitudinal Survey of Youth, to understand how demographic identities intersect with the labor market. For example, researchers have used the CPS to assess racial and ethnic disparities in labor market outcomes.5 Research using the PSID has studied the influence of workforce aging on trends in labor turnover rates.6 Survey data have been used to estimate the heterogeneous effects of labor policy changes, such as how minimum wage laws differentially affected workers in different age and race groups.7

However, survey datasets have several problems and limitations. For example, these sources are subject to concerns about recall bias, social desirability bias, and nonresponse bias; these problems have been increasing in recent years.8 Further, sample size considerations with surveys often limit researchers from conducting detailed subgroup analyses using these data, so we have limited knowledge about labor market outcomes for smaller groups. For example, data sources have limited information on the number of Asian American or Native American workers or the number of workers living in specific rural regions. The existing literature has also been limited in the types of employment histories that can be examined, because it is very costly for survey data sources to collect longitudinal data. For example, survey datasets rarely identify detailed employment histories or track changes in hours and wages at the job level.

Administrative data, such as UI employment records, can address these issues and could be particularly powerful analytically if paired with demographic information on workers. The U.S. Census Bureau’s Longitudinal Employer–Household Dynamics (LEHD) effort is a prominent federal use of administrative records for tracking worker demographics.9 The Census Bureau collects UI records from states and merges them with demographic information collected through its surveys, including the decennial censuses and other government records. Also, the Census Bureau compiles data across all states that participate. Almost all states contribute data to the LEHD. As of October 2022, 46 states plus Puerto Rico were partners to the LEHD effort.10 LEHD data form the basis of state-level demographic summaries of worker characteristics titled the Quarterly Workforce Indicators, which are released with approximately a three-quarter delay. The Census Bureau also creates publicly available summary data based on the LEHD on topics that include transitions in and out of jobs, employment outcomes for different levels of postsecondary education, and commute information (origin-destination data).11 Researchers can apply to use restricted LEHD microdata within Federal Statistical Research Data Centers.

In this article, we describe a state-level alternative to survey or LEHD data; this alternative is a project linking Washington State UI employment records with demographic microdata. Our effort has many of the advantages of LEHD data and also includes other types of information that are not included in LEHD, such as records of births and public assistance use. As such, it bears similarity to state-level integrated administrative data efforts, most notably Wisconsin’s Administrative Data Core, Washington State’s Integrated Client Database, or the use cases presented by Hawn Nelson and colleagues.12 However, the data described in this article are unique among state-level efforts both in their initial aim of capturing all employees within the state and in some of the data sources included, namely voting and driver’s license records.

The outline of this article is as follows. We begin with an overview of the larger data assembly process that involved merging data across multiple state agencies. We describe the sources and nature of the data and explain how we processed the data, including imputing some ethnorace information. We then benchmark our work against ACS data, which demonstrates that our administrative data estimates across age, sex, and ethnorace correspond closely to ACS estimates. We conclude with a discussion of the strengths and limitations of such a state-level administrative data resource. This work has relevance for researchers working in other states, where similar records are typically collected by state agencies and could be combined to generate a resource to answer important questions about workers, firms, and the effects of labor policies.

Washington State administrative data

In this section, we review the data sources from Washington State agencies that we merged in order to create a longitudinal administrative dataset. These sources include the following: Employment Security Department unemployment insurance records, Department of Licensing records, Secretary of State records, Department of Social and Health Services records, and Department of Health records.

Creating the Washington Merged Longitudinal Administrative Data

The linkages between UI employment records and demographic indicators in Washington State are part of a larger project, the Washington Merged Longitudinal Administrative Data (WMLAD) effort, commissioned by University of Washington researchers and assembled in collaboration with multiple state agencies. The Washington State Institutional Review Board approved this data construction as part of a project to examine the impact of Seattle’s 2014 minimum wage ordinance and to oversee data sharing agreements between the different agencies and the research team. WMLAD records use a consistent and unique person identifier, which allows researchers to merge information from different agencies and follow individuals over time. All datasets cover the period from 2010 to 2016, but a few datasets cover longer periods. Together, these records contain information on over 10 million individuals, which represents a near census of Washington State’s adult residents during this time.13

The University of Washington team contracted with the Research and Data Analysis (RDA) division of Washington State’s Department of Social and Health Services (DSHS) to perform the matching required to construct WMLAD. This work centered around the construction of a unique numeric individual identifier that could be used to link Washingtonians across data sources. While some files contain numeric identifiers such as Social Security numbers, other files are less straightforward to match because individuals are only identified based on nonunique identifiers, such as name and birth date. Under the terms of the data sharing agreement, RDA performed the data matching and transferred deidentified data to the research team.

RDA has capacity to perform high level and complex matching because of its two-plus decades of experience in building and maintaining the DSHS Integrated Client Database, which combines records across DSHS administrations with outside records from several other agencies.14 The project used the Link King record linkage program, a public-domain application that runs on the data analysis program SAS.15 Link King uses a combination of deterministic and probabilistic matching algorithms. The linking process included data standardization, blocking into similar records, identity matching, and quality review. Quality control measures included careful implementation of linkage software with an established pedigree in which decision rules were modified for each linkage task based on manual review of random samples of identity groupings. We were able to confirm the validity of the linkage results by showing that they were generally consistent with expectations based on known overlap in clients from linked data sources.

Data sources in the Washington Merged Longitudinal Administrative Data

UI employment records from the Washington State Employment Security Department form the core of the current analysis. WMLAD augments these employment records with demographic information drawn from four different sources: driver’s license records, voter records, public assistance program records, and birth records. We describe each data source, and table 1 summarizes the demographic data available from each.

 Table 1. Demographic data availability in Washington Merged Longitudinal Administrative Data
Washington State agency Record type Number of individuals from 2010 to 2016 Age and sex data Reported race and ethnicity data Ability to impute race and ethnicity Share of workers included  from 2010 to 2016 (in percent)

Employment Security Department 

Employment records for jobs eligible for unemployment insurance 6,134,460 workers No No No 100.0

Department of Licensing 

Driver's license registration records 7,597,007 licensees Yes No Yes 76.4

Secretary of State 

Voter registration and voting history records 5,869,390 voters Yes No Yes 53.8

Department of Social and Health Services 

Client program participation and benefit receipt records 4,962,086 clients Yes Yes Yes 42.1

Department of Health 

Records on parents listed on Washington State birth certificates 896,588 parents on birth certificates Yes Yes Yes 10.7

Source: Records from the following Washington State agencies: Department of Licensing, Secretary of State, Department of Social and Health Services, Employment Security Department, and Department of Health.

Employment Security Department UI records

UI records from the state’s Employment Security Department report quarterly the job-level hours worked and wages earned among workers employed in Washington State and eligible for UI. The granularity of these data enables in-depth analysis of labor market trends and the effects of policies on the labor market. Notably, Washington State is one of only eleven states that report data on hours worked in these records, which allows researchers to study wage rates and labor force attachment in ways that would not be possible with more limited data in other states.16 Records are linked to firm identifiers, and this allows researchers to link worker histories to important employer characteristics, such as North American Industry Classification System codes, firm size, and firm geographic location. However, UI records do not include any demographic information on age, sex, race, or ethnicity.

Department of Licensing records

Driver’s license records from the Department of Licensing report age and sex for those with an active Washington State driver’s license between 2005 and 2016. These records also report last name and residential address information, which is used to impute race and ethnicity.

Secretary of State voter records

Voter records from the Washington Secretary of State report age and sex for the population of registered voters between 2006 and 2016. These records also contain last name and residential address information and include monthly indicators of whether an individual voted in a given month. Voting in Washington State has occurred exclusively by mail since 2011, and many counties implemented vote by mail even earlier. Therefore, Washington State voters are incentivized to keep their address current in the database so that they will receive their ballot by mail. Hence, address records in Washington State’s voter data are likely to be more accurate than in states with in-person voting.

Department of Social and Health Services client records

DSHS records report age, sex, ethnorace, and residential address for clients of social assistance programs, including Basic Food (the name of the Supplemental Nutrition Assistance Program in Washington State), Temporary Assistance for Needy Families, medical assistance programs, Working Connections Child Care, child support, and Supplemental Security Income. Ethnorace information in these records is primarily self-reported by clients. However, if clients do not provide this information, it is sometimes reported based on case worker observation.

Department of Health birth records

Records from Washington State’s Department of Health (DOH) report age, sex, and ethnorace data for parents listed on birth certificates of babies born in Washington State between 2010 and 2016. These records also report the residential addresses of the parents at the time of the birth.

Assembling reported demographic data

Using the WMLAD unique identifier, we assemble demographic data on individuals across the data sources. If reported demographic data are in conflict across or within data sources, then we choose a single value as follows. First, we collapse the records within a data source by person and demographic characteristic. For example, a person with two different values of age within a particular dataset would have two rows for this dataset after collapsing. Then, we append these collapsed records from all data sources together and identify the modal value for each person. For age, if there is no unique modal response, we take the average if this age range is less than or equal to five. If there is no unique modal response and the age range is greater than five, we then prioritize information, in the following order: Department of Licensing, Secretary of State, Department of Social and Health Services, then Department of Health. This priority order is based on our understanding of the accuracy of the different data sources. If there were multiple different values reported in the highest priority source, we prioritize more recent observations from that source.

Imputing race and ethnicity

We augment data on reported race and ethnicity by imputing race and ethnicity for an additional subset of the population. We do this by combining information on residential location and last name using the Bayesian Improved Surname Geocoding (BISG) method.17 The BISG method combines information of an individuals’ last name and residential address location to generate a probability distribution of the likelihood a given individual identifies with one of six ethnorace categories. These probabilities are estimated in the model based on two main calculations that are proposed in “Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities” by Elliott et al. and code from the Consumer Financial Protection Bureau.18 We impute this information as follows. First, the probability of belonging to race and ethnicity r given surname s, p(r|s), is calculated based on 2010 decennial census data on the proportion of the population with a given surname identifying as each ethnoracial group. Second, the proportion of people of race and ethnicity r who live in geographic area g, q(g|r), is calculated using Census Bureau data on the ethnoracial breakdown of a given census tract or block group. Then, these two probabilities are combined using Bayes’ theorem to calculate the probability an individual belongs to a given group given last name and residential location:

,

in which r is a given ethnorace category among set R {non-Hispanic White; non-Hispanic Black; non-Hispanic American Indian or Alaska Native (AIAN); non-Hispanic Asian or Pacific Islander (API); non-Hispanic multiracial or other race; or Hispanic of any race}, g represents residence in a given census geography, and s represents a given surname. The resulting probabilities can be used to weight population estimates by ethnoracial group or can be used to assign each individual a category based on the highest probability group.

For the subset of our records that also have reported ethnorace information, we can assess the quality of the BISG imputation.19 For this group, the overall predicted distribution of ethnoracial groups matches the distribution in the reported data quite well. The composition of the population according to the reported data, with the BISG imputed breakdown in parentheses, is as follows: White, 68 percent (65 percent); Black, 5 percent (5 percent); AIAN, 1 percent (1 percent); API, 7 percent (7 percent); Hispanic, 15 percent (17 percent); and multiracial or other, 4 percent (4 percent). However, the accuracy of the imputation at the individual level varies widely by group. For example, among persons who reported as White in the DOH and DSHS data, the BISG method generates a predicted probability of being White of 87 percent, on average. In contrast, among persons reported to be Black, the BISG method predicts their probability of being Black to be only 31 percent, on average. The BISG method is most accurate for White, API, and Hispanic individuals and less accurately imputes ethnorace for Black, AIAN, and multiracial or other individuals.

Availability of demographic information for workers in UI employment records

Once demographic data from across state administrative data sources are assembled, which includes both reported and imputed measures, we assess the extent to which this demographic information covers the working population in the UI data. Table 1 shows the extent to which each source of demographic data covers the population of workers in the UI employment records.

UI employment records include any worker with nonzero earnings from UI-eligible employment occurring any time between 2010 and 2016. For each data source (driver’s license records, voting records, DSHS client records, and birth records), we report the percent of this employed population with data on sex, age, and race and ethnicity from that data source in table 2. We report separate estimates for workers for whom we have reported ethnorace data only, imputed ethnorace data and no reported ethnorace data, and imputed and/or reported ethnorace data. We also distinguish between workers for whom we only have demographic information from each specific data source versus workers for whom we also have demographic information from one or more of the other three data sources. For example, about 21 percent of workers have data on age that are only reported in driver’s license records, over 57 percent of workers have data on age reported in driver’s license records and another data source, and about 22 percent of workers are either not in the driver’s license records or do not have an age value reported in these records. Finally, table 2 also shows the overall availability of demographic information on workers across all sources. In these results, we know the age and sex for roughly 81 percent of workers. Ethnorace is reported for a minority of workers (about 41 percent), but the more extensive availability of data on last name and residential location allows us to assign ethnorace for over 76 percent of workers.

Table 2. Demographic coverage of Washington Merged Longitudinal Administrative Data employed population from administrative data sources (in percent)
AgencyData availability statusAgeSexReported race/ethnicityImputed but not reported race/ethnicityImputed and/or reported race/ethnicity

Department of Licensing

Not available in source22.3422.34100.0031.1331.13
In source and other sources57.1054.790.0061.0861.08
Uniquely in source20.5622.870.007.797.79

Secretary of State

Not available in source74.0274.09100.0050.5050.50
In source and other sources25.9725.910.0049.4449.44
Uniquely in source0.000.000.000.060.06

Department of Social and Health Services

Not available in source57.8957.8562.9197.0859.99
In source and other sources39.2939.206.182.7535.10
Uniquely in source2.822.9530.900.174.91

Department of Health

Not available in source89.64100.0089.6899.6789.35
In source and other sources10.350.006.080.3310.26
Uniquely in source0.010.004.240.000.39

All sources

Not available19.3319.3558.6765.0523.72
Available80.6780.6541.3334.9576.28

Notes: The “Reported race/ethnicity” column encompasses all workers who had information on reported race/ethnicity available, regardless of whether they also had imputed information available. The Washington Merged Longitudinal Administrative Data employed population in table 2 consists of all workers with nonzero earnings with jobs that are eligible for unemployment insurance at any point between 2010 and 2016. It is not restricted to workers who had Washington State addresses.  

Source: Authors’ calculations based on records from the following Washington State agencies: Department of Licensing, Secretary of State, Department of Social and Health Services, Employment Security Department, and Department of Health.

The results summarized in table 2 illustrate the different strengths and weaknesses of each data source. The driver’s license records cover the largest share of the employed population by far, and these records also provide information on a substantial share of workers who are not covered by other sources. However, a key disadvantage of the licensing records is that they do not contain race and ethnicity data. In terms of reported race and ethnicity data, the DSHS records cover the most workers, followed by DOH records. The DSHS and voter records each cover a fairly substantial share of the employed population, but they also represent specific populations that are likely not representative of the working population at large. Overall, WMLAD suggests that merging multiple administrative data sources holds the most promise for attaching demographic data to employment records. Of all these sources, the driver’s license records cover the largest share of the working population.

Benchmarking of WMLAD administrative data to American Community Survey estimates

Next, we examine the extent to which the assembled data on UI-covered wage earners mirrors other evidence about the state workforce, both overall and by demographic groups. In order to assess the accuracy, to the best of our ability, of WMLAD’s demographics-enhanced UI employment records, we use the ACS Public Use Microdata Sample to create a benchmark population. This exercise addresses the following questions. How do the WMLAD counts of workers compare to counts constructed from ACS data? How does the income distribution compare across the two sources? How do the WMLAD and ACS worker populations compare across the demographic dimensions of age, race, ethnicity, and sex?

Comparing WMLAD and ACS population definitions

The first step of the benchmarking exercise is to align the definitions of “workers” across the ACS and UI systems. Our comparison uses specifications of the ACS and WMLAD populations that we believe represent the most conceptually similar sets of populations, although some unavoidable differences remain.

We begin with two approximations of the employed population, which we broadly define as follows: all workers with nonzero earnings in the Employment Security Department UI wage records and in the ACS. In the ACS data, earnings include all pretax wage and salary income, which is comparable to the wages described in the UI records.20 The UI employment records do not generally include self-employed workers, so we filter out all self-employed workers from the ACS data. Further, the two data sources have different inclusion criteria based on residency: the UI records describe employment for workers who are employed in Washington State, whereas the ACS includes employment of workers who reside in Washington State. There are likely a substantial number of Idaho and Oregon residents who are included in the Washington State Employment Security Department’s UI data but are not part of the ACS sampling frame because they do not reside in Washington State. To address this issue, we filter the UI records to workers with a known Washington State address based on other WMLAD sources. That is, we include workers who had an address in the driver’s license, voter, DSHS client, or birth databases between 2010 and 2016. There are also Washington State residents who work in Idaho or Oregon and are therefore included in the ACS sampling frame but not the Washington State UI records. We address this issue by filtering the ACS to exclude all workers whose place of work was a location other than Washington State.

In comparing ACS workers who are not self-employed with WMLAD workers with a Washington State address, we believe there is one major substantive discrepancy that remains between the two data sources: the ACS still includes non-UI-eligible employment, such as some agricultural and maritime work, work conducted in certain institutional sectors, or domestic workers who work on an employment basis.21 We do not believe that further refinement is possible given the nuances of the UI coverage rules.22 Back-of-the-envelope calculations demonstrate that population counts of our preferred WMLAD specification are consistently 95 percent to 99 percent of our preferred ACS specification in any given year. We believe this gap most likely represents individuals who work on an employment basis but are not eligible for UI. In addition to this explicit discrepancy in the populations covered by each data source, other discrepancies may emerge based on issues such as data errors, reporting biases, and nonresponse biases. In the appendix, we describe our process of constructing and comparing different specifications of the employed population from these two data sources in more detail.

Employed population by annual earnings: WMLAD versus ACS

How does WMLAD’s earnings distribution compare to reported earnings in the ACS? Using our preferred specifications of the working population from both WMLAD and the ACS, we next compare total annual earnings across workers in these two populations. Table 3 displays this distribution for 2013, the midpoint of our 2010 to 2016 range.23

Table 3. Employed population by annual earnings, Washington Merged Longitudinal Administrative Data and American Community Survey, 2013
Earnings CategoryAmerican Community SurveyWashington Merged Longitudinal Administrative Data
Number of workersPercent of workersNumber of workersPercent of workers

Less than $5,000

337,18110.39439,50113.99

5,000 – 9,999

233,0877.18258,3728.22

10,000 – 14,999

225,1276.94224,5257.15

15,000 – 19,999

222,2076.85214,5476.83

20,000 – 24,999

236,7637.30199,4336.35

25,000 – 29,999

189,4145.84186,4455.93

30,000 – 34,999

217,9026.71178,6285.68

35,000 – 39,999

175,8645.43164,2205.23

40,000 – 44,999

179,6605.54147,5044.69

45,000 – 49,999

138,6444.27132,0834.20

50,000 – 54,999

164,0455.05116,8463.72

55,000 – 59,999

94,8572.9299,2793.16

60,000 – 69,999

191,5985.90168,4075.36

70,000 – 79,999

135,4084.17131,8794.20

80,000 – 89,999

108,0013.3398,7563.14

90,000 – 99,999

83,4522.5774,3762.37

100,000 or more

312,2039.62307,5449.79

Note: The American Community Survey employed population consists of workers with nonzero earnings who were not self-employed in the reference period and did not work outside of Washington State. The Washington Merged Longitudinal Administrative Data employed population consists of workers with nonzero earnings who had a Washington State address in the Department of Licensing, Secretary of State, Department of Social and Health Services, or Department of Health databases between 2010 and 2016. Washington Merged Longitudinal Administrative Data reports all earnings within the 2013 calendar year, while American Community Survey reports earnings during the 12-month period prior to the survey response month. The increase from the band starting at $55,000 to the one starting at $60,000 is because we use $10,000 intervals from that point forward. 

Source: Authors’ calculations based on the American Community Survey Public Use Microdata Sample, 2010–2016, and records from the following Washington State agencies: Department of Licensing, Secretary of State, Department of Social and Health Services, Department of Health, and Employment Security Department.

The overall distribution of workers by annual earnings is quite similar between these two data sources, with a few notable discrepancies. For example, in 2013, WMLAD captures just over 100,000 more workers earning less than $5,000 compared with the ACS. Roughly 14 percent of WMLAD workers earn less than $5,000 compared with just over 10 percent of ACS workers. This aligns with research suggesting that survey estimates of income can be erroneous because of various response biases or nonrandom nonresponses.24 Workers who are very weakly connected to the labor market may not think to report their earnings to the ACS or may be less likely to end up in the ACS because of nonresponse patterns. It is also possible that some of these workers had additional earnings that were not captured in the unemployment insurance records, resulting in WMLAD underestimating their earnings.

At higher earnings levels, particularly between $20,000 and $70,000, the ACS tends to report more workers at almost every earnings bracket. We suspect this is likely representative of the fact that the ACS includes a set of workers who are ineligible for unemployment insurance. Finally, an interesting pattern emerges as we break out $10,000 earnings increments into two equal $5,000 increments (e.g., $10,000 to $14,999 and $15,000 to $19,999). While WMLAD exhibits a relatively smooth earnings distribution across these increments, ACS respondents seem to exhibit a bias towards reporting round numbers. Within each $10,000 increment, respondents in the ACS population are relatively more likely to report round earnings numbers in the increment containing the round multiple of $10,000 (e.g., $10,000 to $14,999) and less likely to report earnings numbers from the second half of the interval (e.g., $15,000 to $19,999, see table 3). We are unable to verify whether this is a result of rounding up or down, but respondent bias towards round numbers has been consistently reported in survey data and suggests an advantage of administrative data to achieve increased precision.25

Employed population by demographic characteristics: WMLAD versus ACS

Next, we use the merged datasets in WMLAD to describe the demographic characteristics of the employed population captured by the UI employment records. We benchmark the merged demographic data in WMLAD against ACS estimates. We again use our preferred specifications of the employed population in WMLAD and ACS (as described in the appendix) in all analyses below.

Age groups

First, we assess how the age distribution of the ACS employed population compares with the WMLAD employed population. Table 4 presents the age breakdown of the employed population in 2013 in a comparison of the ACS population to the WMLAD population. The distribution of workers by age is remarkably similar across the two sources. WMLAD tends to have a slightly higher proportion of workers ages 16 to 17, 23 to 34, 45 to 49, and 65 to 69. In contrast, the ACS has a slightly larger proportion of workers ages 18 to 22, 35 to 44, 50 to 60, and 70 and older. The two sources have a nearly equal percentage of workers ages 60 to 64. However, all of these differences are quite small, and the distribution of age ranges across these two sources is similar.

Table 4. Employed population by age, Washington Merged Longitudinal Administrative Data and American Community Survey, 2013
AgeAmerican Community Survey, percent of workersWashington Merged Longitudinal Administrative Data, percent of workers

16–17

1.081.34

18–22

9.939.87

23–29

16.5716.80

30–34

11.5211.60

35–39

10.2410.04

40–44

10.5010.17

45–49

9.769.88

50–54

10.3310.28

55–59

9.279.18

60–64

6.456.44

65–69

2.832.92

70 and older

1.521.49

Note: The American Community Survey employed population consists of workers with nonzero earnings who were not self-employed in the reference period and did not work outside of Washington State. The Washington Merged Longitudinal Administrative Data employed population consists of workers with nonzero earnings who had a Washington State address in the Department of Licensing, Secretary of State, Department of Social and Health Services, or Department of Health databases between 2010 and 2016. Data entries that are missing data on the age of a worker, consisting of 0.01 percent of workers with Washington State addresses, are excluded from the Washington Merged Longitudinal Administrative Data population in this table. Once we restrict the Washington Merged Longitudinal Administrative Data population to workers who had a Washington State address, the share of individuals missing information on age is much smaller than the share reported in table 2. This is because the data sources that indicate Washington residence (i.e., Department of Licensing, Secretary of State, Department of Social and Health Services, or Department of Health) all also report information on age, so age is missing much more sporadically for individuals who have a record of a Washington State address.

Source: Authors’ calculations based on the American Community Survey Public Use Microdata Sample, 2013, and records from the following Washington State agencies: Employment Security Department, Department of Licensing, Secretary of State, Department of Social and Health Services, and Department of Health.

Race and ethnicity

Next, we compare the composition of the working population by race and ethnicity in WMLAD versus the ACS. Table 5 depicts this comparison for all ethnorace categories. As we saw with age, the breakdown of the WMLAD working population by race and ethnicity is quite similar to the ACS estimates. The share of workers who identified as non-Hispanic White was nearly identical across the two data sources; these workers represent 72 percent of the employed population in each source. The WMLAD population has a comparatively higher proportion of non-Hispanic Black workers and workers identifying as multiracial, while the ACS population has a higher proportion of non-Hispanic Asian and Hispanic workers. The share of workers who were Native American or Alaska Native were nearly equal in the estimates of the two data sources.

Table 5. Composition of employed population by race and ethnicity, Washington Merged Longitudinal Administrative Data and American Community Survey, 2013
Race and ethnicityAmerican Community Survey, percent of workersWashington Merged Longitudinal Administrative Data, percent of workers

Black, non-Hispanic

3.763.92

American Indian and Alaska Native, non-Hispanic

1.001.01

Asian and Pacific Islander, non-Hispanic

8.638.06

Other and multiracial, non-Hispanic

3.304.23

Hispanic, any race

11.3210.75

White, non-Hispanic

71.9872.03

Note: The American Community Survey employed population consists of workers with nonzero earnings who were not self-employed in the reference period and did not work outside of Washington State. The Washington Merged Longitudinal Administrative Data employed population consists of workers with nonzero earnings who had a Washington State address in the Department of Licensing, Secretary of State, Department of Social and Health Services, or Department of Health databases between 2010 and 2016.

Source: Authors’ calculations based on the American Community Survey Public Use Microdata Sample, 2013, and records from the following Washington State agencies: Employment Security Department, Department of Licensing, Secretary of State, Department of Social and Health Services, and Department of Health.

Sex

In table 6, we compare the breakdown of the working population by sex in WMLAD compared with the ACS. The WMLAD employed population has a comparatively larger share of female workers compared with the ACS, though the estimates are again relatively similar. Both sources estimate that the workforce is majority male.

Table 6. Composition of employed population by sex, Washington Merged Longitudinal Administrative Data and American Community Survey, 2013
SexAmerican Community Survey, percentWashington Merged Longitudinal Administrative Data, percent

Male

52.9351.84

Female

47.0748.16

Note: The American Community Survey employed population consists of workers with nonzero earnings who were not self-employed in the reference period and did not work outside of Washington State. The Washington Merged Longitudinal Administrative Data employed population consists of workers with nonzero earnings who had a Washington State address in the Department of Licensing, Secretary of State, Department of Social and Health Services, or Department of Health databases between 2010 and 2016. In the Washington Merged Longitudinal Administrative Data, 0.01 percent of the working population with a Washington state address was missing data on sex; these workers are excluded from this table. 

Source: Authors’ calculations based on the American Community Survey Public Use Microdata Sample, 2013, and records from the following Washington State agencies: Employment Security Department, Department of Licensing, Secretary of State, Department of Social and Health Services, and Department of Health.

Discussion

In this section, we discuss the uses of merging administrative data to UI records and the limits of merged datasets. We also consider the application of merged administrative data for agencies and researchers in other states. 

Using administrative data to attach demographics to UI employment records

This work shows that combining data from four state administrative sources––records from driver licensing, voting, public assistance programs, and birth certificates––allows us to assign demographic characteristics to a majority of workers in UI employment data. Merging data across these sources yields sex and age indicators for 81 percent of workers in Washington State’s UI files and ethnorace indicators for 76 percent of this same population.

Comparing these merged data with ACS demographic characteristic estimates of the working population shows considerable alignment with a few notable exceptions. Our combined administrative data, WMLAD, identified more workers in the lowest earnings bracket than the ACS did. This speaks to the advantage of administrative data in capturing experiences of workers who are not strongly connected to the labor force. In addition, patterns of overreporting round earnings numbers in the ACS reveal another source of inaccuracy in survey data. Our administrative data matched the ACS quite closely for age, sex, and race and ethnicity. Younger workers, Black workers, and female workers were slightly overrepresented in WMLAD compared with the ACS. Middle-aged workers, Asian workers, Hispanic workers, and men were slightly overrepresented in the ACS compared with WMLAD.

Limitations of the merged administrative data approach

Like all data sources, our set of merged administrative data records has important limitations. Though we can identify demographic characteristics for a large share of the working population, there are still substantial amounts of missing data. For example, we are unable to identify age and sex for approximately 19 percent of the Washington State workers in the UI employment data, and we have neither imputed nor reported ethnorace data for 24 percent of this population. While survey data are not immune from the problem of item nonresponse on basic demographic characteristics, the scale of the problem tends to be far lower in most large surveys. For example, between 1 and 2 percent of ACS records are typically missing data on race and age, and less than 1 percent of records are missing data on sex.26

Relatedly, while disaggregating UI employment records by race and ethnicity can offer insights into racialized labor market exclusion, the reliance on imputation poses a drawback. In particular, America’s history of colonialism and slavery mean that many Black and Native Americans have European last names, making the imputation process based on last name less accurate. Our team missed one opportunity to improve this process; we could also have used first names to further improve our BISG imputation, which has been shown to improve imputation accuracy for Black Americans.27 Including more years of birth certificate data would have allowed us to include more self-reports of ethnoracial identity. Finally, starting in 2017, Washington State birth certificate data includes separate fields for tribal affiliation, which would allow for further disaggregation, conditional on tribal permissions per required data sovereignty practices.28

Administrative data sources are less reliant on self-reported measures than survey data, which is a limitation as well as a strength. In measuring complex, socially constructed concepts, such as household composition or metrics that are not measured for the purpose of administering governmental programs, self-reported data can be a useful way to capture socially meaningful concepts as individuals understand them. However, people often misreport quantities such as earnings and income; in this case, the nonself-reported administrative data offers an advantage in accurately capturing these measures.

Finally, because no data source accurately measures the true characteristics of a population, there are limits to our comparison between WMLAD and the ACS. We can only compare the data sources to each other but cannot assess which one is the more accurate representation of on-the-ground reality. In addition, measurement issues and ambiguities in population and sample definitions mean that while we can guess the sources of discrepancy between WMLAD and the ACS, we cannot confidently state the reasons for these discrepancies. WMLAD may contain more very low earners because this population is less likely to be captured by ACS address-based sampling methods or because very low earners are less likely to recall or report their earnings when questioned by the ACS. The discrepancy may also stem from reporting errors that show up in the WMLAD–UI employment records, such as employers using invalid Social Security numbers for some immigrant employees who do not have work authorization or other reasons.

Implications for researchers in other states

No data source is “one size fits all.” The process and data described here yield a valuable resource for answering questions about economic and social processes in Washington State, and some parts of our process might be helpfully replicated for other questions or in other states. Toward this end, we offer some lessons learned.

This work illuminates the importance of carefully considering the population covered by each data source and the advantages and benefits of each agency’s data based on the specific research aims. Therefore, we suggest that a strategic approach would consider these features of data sources when prioritizing their acquisition. For example, some sources may not provide much additional information compared with other sources but could require substantial time or funds to acquire.

In our case, while most adult Washingtonians have a driver’s license, many vote or receive benefits, and some are listed on birth certificates in a given year, none of these sources fully captures the state population or the working population. Each of these sources offers unique advantages and disadvantages. For example, DSHS and DOH records were helpful because they were the only two data sources with reported race and ethnicity. However, only a select subset of the population appears in those records. In contrast, the Department of Licensing data was the most comprehensive in providing information on the largest share of the working population. Driver’s license records provided the greatest marginal benefit in identifying age, sex, and residential address (which was used to impute race and ethnicity), but these records did not include ethnorace measures. Given that we had driver’s license records, the voter records from the Secretary of State did not have much additional benefit in terms of identifying age and sex. However, Washington State’s vote-by-mail voter records were particularly helpful in determining residential location histories, which is an important issue for related work that requires placing workers into specific city or county areas.29 Were we not interested in geographic location, leaving the voting records out would have greatly simplified the merging process.

A deep understanding of the bureaucratic, social, and political processes that lead to the generation of the data is also very useful and well worth developing. For example, our knowledge of Washington State’s vote-by-mail policies provides confidence in the quality of these address data, whereas conversations with state agency analysts have helped us to be more wary about the quality of other address records. These conversations led us to believe that address updates (i.e., new addresses) were more likely to be accurate at the time of their reporting than ongoing reports of remaining at the same address.

Assembling this type of merged administrative data resource requires a substantial amount of work, resources, and political will. We were fortunate in Washington State to have a fruitful collaboration between the University of Washington and dedicated researchers from the state agencies. Full costs for the process of securing permissions, assembling, and cleaning this data would have likely cost more than $700,000 had unpaid investigator and agency staff time been fully included. Rough estimates for producing a dataset of this size on an ongoing (annually updated) basis are at a similar order of magnitude per year given the additional tasks associated with ongoing production. However, nationally representative surveys also require substantial resources; for example, the ACS costs over $200 million to administer each year.30

The results described above illustrate that merged administrative data is a promising strategy to produce research that studies employment histories of workers in great detail with the opportunity to disaggregate by demographic characteristics. This approach allows researchers to examine trends over time and disparities across groups. While this approach is not without limitations, we believe that it is a potentially fruitful method to conduct rich research on workers and firms and build evidence on workers’ labor market experiences and the effects of labor policies.

Appendix: constructing comparable populations in WMLAD and the ACS

In this appendix, we describe how we constructed and compared different definitions of the employed population in Washington State from WMLAD and the ACS.

Specifications of the employed population in WMLAD

A key difference between WMLAD and the ACS is that the UI employment records in WMLAD encompass UI-eligible work for employers based in Washington State, whereas the ACS includes data on workers who are residents of Washington State. While most workers employed by Washington-based firms live in Washington State and most workers who reside in Washington State work for Washington-based firms, these populations are not exactly the same. For example, there are workers who live outside of Washington State but work within the state’s borders. There are also workers who live in Washington State but work for firms that are not located in the state. This scenario is particularly likely for residents of the Vancouver–Portland and Spokane–Coeur d’Alene metro areas, both which span state borders. Therefore, to bring the WMLAD worker population closer to the ACS definition, we experiment with different methods of restricting the WMLAD population to individuals with a Washington State address. The first specification, WMLAD Specification 1, does not make any restrictions based on residential address; this specification includes all workers who had nonzero earnings at UI-eligible jobs in Washington State. WMLAD Specification 2 restricts this sample to workers who also had a Washington State address recorded at any point between 2010 and 2016 (from driver’s license, voter, DSHS client, or birth records). WMLAD Specification 3 further restricts the annual sample to workers who had a Washington State address in that specific year (also from driver’s license, voter, DSHS client, or birth records records).

Specifications of the employed population in the ACS

Definitions of employment in the ACS are not exactly comparable to inclusion criteria for the UI employment data. Certain occupations are exempted from UI coverage. In Washington State, the list of noncovered workers includes certain types of agricultural workers, independent contractors, entertainers, and cosmetologists. Estimates of the share of the civilian employed population eligible for UI vary. For example, the Quarterly Census of Employment and Wages from the U.S. Bureau of Labor Statistics estimates that more than 95 percent of jobs are eligible for UI.31 In contrast, the Center on Budget and Policy Priorities estimates that 82 percent of the civilian labor force was eligible for UI in 2010.32 Despite these discrepancies, our records likely capture the vast majority of employed workers in Washington State.

Hence, we also test multiple specifications of the employed population in the ACS to assess the quality of the match with WMLAD. We exclude Washington State residents who work in a state other than Washington State from all specifications. We filter the ACS Public Use Microdata Sample records based on the “total person’s earnings” variable. It is important to note that this definition differs from WMLAD in terms of the reference period; our analysis of WMLAD captures workers who had earnings throughout a given calendar year, while the ACS interviews individuals throughout the year and asks them about the 12-month period prior to the survey. However, both specifications inquire about a yearlong period of time. ACS Specification 1 comprises all Washington State residents who had nonzero earnings in the year prior to being surveyed and did not work outside of the state. In Specifications 2 and 3, we then further filter based on the “class of worker” variable to attempt to more accurately approximate the UI-eligible workforce. ACS Specification 2 includes workers employed by private organizations, nonprofit organizations, and federal, state, and local government, as well as incorporated self-employment.33 In Specification 3, we only include those employed in private and nonprofit organizations and federal, state, and local government, but we exclude all self-employment.

Comparisons of WMLAD and ACS specifications

In table A-1, aggregate counts of the working population using different specifications from the WMLAD and ACS sources illustrate the difference in the overall size of these populations between 2010 and 2016.

Table A-1. Comparing the population of Washington State workers across data sources and population definitions, number of individuals
YearAmerican Community Survey population definitionsWashington Merged Longitudinal Administrative Data population definitions
ACS Specification 1: Nonzero earningsACS Specification 2: Nonzero earnings, including incorporated self-employmentACS Specification 3: Nonzero earnings, no self-employmentWMLAD Specification 1: No address restrictionsWMLAD Specification 2: Washington State address from 2010 to 2016WMLAD Specification 3: Washington State address that year

2010

3,478,4943,247,6263,117,6793,285,1082,977,9102,889,345

2011

3,505,3943,274,3663,145,3983,319,1473,018,9412,925,406

2012

3,547,2573,297,8143,172,0153,373,6673,072,2582,990,863

2013

3,605,5683,373,1833,245,4133,445,9253,142,3453,044,821

2014

3,662,0823,434,9693,297,0243,531,6353,207,6993,125,572

2015

3,716,3143,480,4873,335,3153,639,5283,291,3083,198,257

2016

3,826,9313,586,3533,448,7203,717,1913,356,9503,289,027

Note: ACS = American Community Survey; WMLAD = Washington Merged Longitudinal Administrative Data.

Source: Authors’ calculations based on the American Community Survey Public Use Microdata Sample, 2010–2016, and records from the following Washington State agencies: Employment Security Department, Department of Licensing, Secretary of State, Department of Social and Health Services, and Department of Health.

The most comprehensive specification is ACS Specification 1, which is the ACS population reporting any earnings; the number of people in ACS Specification 1 ranges from 3.5 to 3.8 million workers between 2010 and 2016. This population is not restricted to UI-eligible employment and does not specify a specific class of worker. As the ACS specifications become more restrictive, the populations become smaller, which is expected. Excluding all self-employment results in a worker population roughly 10 percent lower than the “any earnings” ACS Specification 1 in any given year. All three ACS specifications follow similar time trends.

The WMLAD populations also increase between 2010 and 2016, although at a steeper rate than the ACS populations. The largest WMLAD population, with no restrictions based on Washington State residency, yields counts in between ACS Specification 1 and ACS Specification 2. The two WMLAD definitions that restrict based on Washington State residency result in lower counts than all of the ACS specifications.

To summarize, the differences between these populations of workers, as illustrated in table A-1, reflect multiple key distinctions between WMLAD and the ACS. First, WMLAD is restricted to UI-eligible employment that would be captured in the administrative data. Second, WMLAD includes workers who work for a Washington-based employer, while the ACS samples from a population of individuals with Washington-based residential addresses. Further, WMLAD uses the calendar year as the reference period, while the ACS survey asks about the 12 months prior to the interview date, which could be at any point in the year. However, the two sources are consistent in that they both capture a 12-month period. Finally, the ACS is subject to biases that are inherent in survey research, which relies on individual self-reporting. For example, respondents may report earnings inaccurately because of recall or social desirability bias. WMLAD also has inaccuracies, but these inaccuracies are more likely due to issues such as data entry error or employer noncompliance with UI rules rather than social processes occurring during an interview. On the whole, the UI employment records represent a more precise accounting of exactly what employers paid workers in Washington State during each quarter.

Preferred specifications

We select ACS Specification 3 and WMLAD Specification 2 as our preferred specifications. We believe that ACS Specification 3, the specification that excludes all self-employment, is the closest we can get to the UI-eligible jobs included in the Employment Security Department data. In the WMLAD, we believe that some filtering based on Washington State addresses is necessary given the likelihood that a substantial population of Oregonians and Idahoans work in Washington State but do not live there. However, we believe WMLAD Specification 3 is too restrictive given the nature of the address data; address information is reported sporadically in each of our data sources and is not likely to be updated regularly. Therefore, we select WMLAD Specification 2 as our preferred WMLAD specification. These are the populations used throughout the main text of the article.

Acknowledgement

The authors wish to thank Jim Mayfield, Senior Research Scientist (retired), Washington State Department of Social and Health Services, for his work on this data, and the following funders for support of this work: Washington Center for Equitable Growth, WorkRise, and the University of Washington School of Social Work. Partial support for this research came from a Eunice Kennedy Shriver National Institute of Child Health and Human Development research infrastructure grant (P2C HD042828), a Shanahan Endowment Fellowship, and a Eunice Kennedy Shriver National Institute of Child Health and Human Development training grant (T32 HD101442) to the Center for Studies in Demography & Ecology at the University of Washington. The content is solely the responsibility of the authors and does not represent the official views of any funder.

 

Suggested citation:

Elizabeth Pelletier and Jennifer Romich, "Supplementing state employment records with demographic data," Monthly Labor Review, U.S. Bureau of Labor Statistics, February 2025, https://doi.org/10.21916/mlr.2025.2

Notes


1 The research for this article was conducted before Elizabeth Pelletier joined the U.S. Census Bureau, and the views expressed in this article are those of the authors alone and are not those of the Census Bureau.

2 Xi Song and Thomas S. Coleman, “Using administrative big data to solve problems in social science and policy research,” University of Pennsylvania Population Center Working Paper (PSC/PARC), no. 2020–58 (University of Pennsylvania Population Center, November 10, 2020), p. 19, https://repository.upenn.edu/psc_publications/58.

3 For example, see the following: Patricia R. Brown, Katie Thornton, Dan Ross, Jane A. Smith, and Lynn Wimer, “Technical report on lessons learned in the development of the Institute for Research on Poverty’s Wisconsin administrative data core” (Institute for Research on Poverty, February 2020), https://www.irp.wisc.edu/wp/wp-content/uploads/2020/08/TechnicalReport_DataCoreLessons2020.pdf; and Lars Vilhuber and Kevin McKinney, “LEHD infrastructure files in the Census RDC – Overview,” Center for Economic Studies, Working Paper Number CES-14-26, (U.S. Census Bureau, June 2014), https://www.census.gov/library/working-papers/2014/adrm/ces-wp-14-26.html.

4 We use the term ethnoracial to refer to racial and/or ethnic identities.

5 Mary Dorinda Allard and Vernon Brundage Jr., “American Indians and Alaska Natives in the U.S. labor force,” Monthly Labor Review, November 2019, https://doi.org/10.21916/mlr.2019.24; and Jennifer Laird, “Public sector employment inequality in the United States and the Great Recession,” Demography, vol. 54, no. 1, February 2017, pp. 391–411, https://doi.org/10.1007/s13524-016-0532-4.

6 Maury Gittleman, “Declining labor turnover in the United States: evidence and implications from the Panel Study of Income Dynamics,” Monthly Labor Review, January 2019, https://doi.org/10.21916/mlr.2019.1.

7 Ellora Derenoncourt, and Claire Montialoux, “Minimum wages and racial inequality,” The Quarterly Journal of Economics, vol. 136, no. 1, February 2021, pp. 169–228, https://doi.org/10.1093/qje/qjaa031; and Joseph J. Sabia, and Robert B. Nielsen, “Minimum wages, poverty, and material hardship: new evidence from the SIPP,” Review of Economics of the Household, vol. 13, January 2013, pp. 95–134. https://doi.org/10.1007/s11150-012-9171-8.

8 In recent years, households have been less likely to respond to surveys and to certain questions on surveys. As a result, a substantial share of information on topics such as earnings and income must be imputed to produce official estimates. Respondents may also provide incorrect information because of not knowing or remembering the answer to a question or consciously or subconsciously reporting incorrect information. Taken together, these problems enhance the possibility that survey estimates are not representative of the population as a whole and do not accurately measure social phenomena. For more, see the following: Bruce D. Meyer, Wallace K.C. Mok, and James X. Sullivan, “Household surveys in crisis,” Journal of Economic Perspectives, vol. 29, no. 4, fall 2015, pp. 199–226, https://doi.org/10.1257/jep.29.4.199.

9 Vilhuber and McKinney, “LEHD infrastructure files in the Census RDC – Overview.”

10 “State partners,” Longitudinal Employer–Household Dynamics (U.S. Census Bureau, July 9, 2024), https://lehd.ces.census.gov/state_partners/.

11 “Data,” Longitudinal Employer–Household Dynamics (U.S. Census Bureau), https://lehd.ces.census.gov/data/.

12 Brown et al., “Technical report on lessons learned in the development of the Institute for Research on Poverty’s Wisconsin administrative data core;” David Mancuso and Alice Huber, “Washington State Health and Human Services Integrated Client Databases,” DSHS Research and Data Analysis Division, RDA Report 11.205 (Washington State Department of Social and Health Services, December 2021), https://www.dshs.wa.gov/sites/default/files/rda/reports/research-11-205.pdf; and A. Hawn Nelson, D. Jenkins, S. Zanti, M. Katz, T. Burnett, D. Culhane, K. Barghaus, et al., Introduction to Data Sharing and Integration, Actionable Intelligence for Social Policy (University of Pennsylvania, 2020), https://aisp.upenn.edu/wp-content/uploads/2020/06/AISP-Intro-.pdf.

13 Mark C. Long, Elizabeth Pelletier, and Jennifer Romich, “Constructing monthly residential locations using merged state administrative data” Population Studies, vol. 76, no. 2, July 2022, pp. 253–272, https://www.tandfonline.com/doi/abs/10.1080/00324728.2022.2085776.

14 This database includes unemployment insurance employment records from the Washington State Employment Security Department and birth records from the Department of Health. For more information, see Mancuso and Huber, “Washington State Health and Human Services Integrated Client Databases.”

15 Kevin Campbell, the Link King developer, joined the Research and Data Analysis division of Washington State’s Department of Social and Health Services for this project and carried out the matching personally. Kevin M. Campbell, “Rule your data with the Link King: a SAS/AF application for record linkage and unduplication,” Paper 020-30 (SAS Users Group International, 2005), https://support.sas.com/resources/papers/proceedings/proceedings/sugi30/020-30.pdf.

16 William J. Congdon and Batia Katz, “Job quality and wage records: the potential role of administrative wage data for understanding job quality” (Urban Institute, May 2023), https://www.urban.org/sites/default/files/2023-05/Job%20Quality%20and%20Wage%20Records.pdf.

17 “Using publicly available information to proxy for unidentified race and ethnicity: a methodology and assessment” (Consumer Financial Protection Bureau, summer 2014), https://www.consumerfinance.gov/data-research/research-reports/using-publicly-available-information-to-proxy-for-unidentified-race-and-ethnicity/; and Marc N. Elliott, Allen Fremont, Peter A. Morrison, Philip Pantoja, and Nicole Lurie, “A new method for estimating race/ethnicity and associated disparities where administrative records lack self-reported race/ethnicity,” Health Services Research, vol. 43, no. 5,p1, October 2008, pp. 1722–1736. https://doi.org/10.1111/j.1475-6773.2008.00854.x.

18 Marc N. Elliott, Peter A. Morrison, Allen Fremont, Daniel F. McCaffrey, Philip Pantoja, and Nicole Lurie,  “Using the Census Bureau’s surname list to improve estimates of race/ethnicity and associated disparities,” Health Services and Outcomes Research Methodology, vol. 9 no. 2, April 2009, pp. 69–83, https://doi.org/10.1007/s10742-009-0047-1; and “Proxy methodology,” GitHub (Consumer Financial Protection Bureau, 2017),  https://github.com/cfpb/proxy-methodology.

19 While we refer to this information as self-reported, as described above, Department of Social and Health Services ethnorace data are not always self-reported. In cases in which clients are unwilling to report this information, caseworkers can fill it in based on observation.

20 This includes wages, salary, Armed Forces pay, commissions, tips, piece-rate payments, and cash bonuses earned before deductions are made (e.g., for taxes, bonds, pensions, or union dues).

21 “Occupations exempted from unemployment insurance coverage” (Washington State Employment Security Department), https://esd.wa.gov/media/pdf/1080/esd-exempt-professions-chartpdf/download?inline.

22 We considered refining our American Community Survey (ACS) definition based on occupation. However, many unemployment insurance eligibility rules are based on a combination of occupation and other employment characteristics, making it unfeasible to accurately refine the ACS population further based on occupation alone.

23 Results for other years were similar and are available from the authors upon request.

24 Charles Hokayem, Christopher Bollinger, and James P. Ziliak, “The role of CPS nonresponse in the measurement of poverty,” Journal of the American Statistical Association, vol. 110, no. 511, November 2015, pp. 935–945, https://doi.org/10.1080/01621459.2015.1029576; and Barry W. Johnson and Kevin Moore, “Differences in income estimates derived from survey and tax data,” SOI Paper Series (Internal Revenue Service, 2008), https://www.irs.gov/pub/irs-soi/08rpjohnson.pdf.

25 Jonathan A. Schwabish, “Take a penny, leave a penny: the propensity to round earnings in survey data,” Journal of Economic and Social Measurement, vol. 32, no. 2–3, December 2007, pp. 93–111, https://doi.org/10.3233/JEM-2007-0284.

26 “Item allocation rates,” American Community Survey (U.S. Census Bureau),  https://www.census.gov/acs/www/methodology/sample-size-and-data-quality/item-allocation-rates/#basic_demographics.

27 Ioan Voicu, “Using first name information to improve race and ethnicity classification,” Statistics and Public Policy, vol. 5, no. 1, March 2018, pp. 1–13, https://doi.org/10.1080/2330443X.2018.1427012.

28 T.C.E. Abrahamson-Richards, “Examining differential birth outcomes among American Indian parents who delivered newborn infants before and after the institution of Washington’s Paid Family Leave Program,” unpublished abstract, 2021.

29 Long, Pelletier, and Romich, “Constructing monthly residential locations using merged state administrative data.”

30 “Fiscal year 2022 budget summary” (U.S. Census Bureau, 2021), https://www2.census.gov/about/budget/census-fy-22-budget-infographic-bureau-overview.pdf.

31 “QCEW overview,” Quarterly Census of Employment and Wages (U.S. Bureau of Labor Statistics, last modified December 2023). https://www.bls.gov/cew/overview.htm.

32 Chad Stone and William Chen, “Introduction to unemployment insurance,” (Center on Budget and Policy Priorities, July 30, 2014), https://www.cbpp.org/sites/default/files/atoms/files/12-19-02ui.pdf.

33 This includes workers employed by private for-profit companies, nonprofit organizations, federal, state, or local government, or self-employed in incorporated business. It excludes workers in unincorporated self-employment and unpaid family/farm labor.

 

article image
About the Author

Elizabeth Pelletier
elizabeth.pelletier@census.gov

Elizabeth Pelletier is an economist at the U.S. Census Bureau.

Jennifer Romich
romich@uw.edu

Jennifer Romich is a professor at the School of Social Work, University of Washington.

close or Esc Key