Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Article
April 2023

Federal government wage indexes

For nearly 50 years, the Employment Cost Index (ECI) has been providing the public with estimates of the change in employer labor costs. We explore the practicality of constructing federal wage indexes, in the spirit of the ECI, using Office of Personnel Management (OPM) salary data. To accomplish this task, we aggregate OPM records into occupation and industry groups. Although these salary data have a crosswalk for mapping OPM occupation codes into the Standard Occupational Classification system, no corresponding crosswalk exists for industries. A key hurdle, therefore, involves creating a crosswalk that assigns industry codes to OPM establishments. We create this crosswalk by developing an algorithm that uses Quarterly Census of Employment and Wages data and machine-learning tools to match agencies with a unique industry. With this agency-North American Industry Classification System crosswalk, we calculate annual Laspeyres, Paasche, and Fisher wage indexes for several aggregations. The resulting wage inflation rates are plausible and are reasonably close to state and local wage inflation rates but deviate from the private industry wage inflation rates.

The Employer Cost Index (ECI) of the National Compensation Survey (NCS) has provided the public with estimates of changes in labor costs since December 1975. At the ECI launch, only private industry estimates were published; however, in June 1981, ECI expanded to include state and local government workers. The federal government, despite being the largest U.S. employer with over 3 million employees (see table 1), is presently out of scope for NCS data products. This article explores, as a proof of concept, the practicality of constructing federal wage indexes using Office of Personnel Management (OPM) salary data. Since this analysis is purely exploratory, we do not attempt to fully replicate ECI methodology, but instead use it as a guide.

 Table 1. Number and percentage of civilian federal workers, by occupation and industry, second quarter of 2020, 2021, and 2022
Category2020
second quarter
2021
second quarter
2022
second quarter
NumberPercentNumberPercentNumberPercent

Occupation

Management, business, and financial

1,588,38149.61,608,05049.51,604,61750.0

Professional and related

924,12328.9949,506

29.2

940,47629.3

Sales and related

10,9080.310,2790.39,3300.3

Office and administrative support

301,9659.4304,4879.4293,6679.1

Service

251,6557.9255,2967.9244,7307.6

Construction, extraction, farming, fishing and forestry

17,0750.516,6800.516,4080.5

Installation, maintenance and repair

18,2980.618,8620.618,3800.6

Production

12,5490.412,4960.412,1150.4

Transportation and material moving

75,1432.374,4272.371,7502.2

Industry

Wholesale and retail trade

37,1991.236,2031.135,7521.1

Transportation and warehousing

7,1230.27,1170.27,3450.2

Elementary and secondary schools

9,0120.39,1760.39,2990.3

Colleges, universities, and professional schools

3,2730.13,3300.12,7740.1

Hospitals

39,9081.240,9451.349,2241.5

Nursing and residential care facilities

8750.08780.08520.0

Rest of health services

5,9750.26,1740.26,1260.2

Rest of services

80,3812.582,5262.583,5022.6

Public administration

3,005,275

93.9

3,052,558

93.9

3,005,394

93.6

Goods producing

11,0760.311,1760.311,2050.3

Work schedule

Full time

3,097,080

96.8

3,147,790

96.9

3,115,65497.0

Part time

103,0173.2102,2933.195,8193.0

Total

3,200,097100.03,250,083100.03,211,473100.0

Source: Authors’ calculations using data from the Office of Personnel Management.

To construct federal wage indexes, we must overcome one major hurdle: records from the OPM data must be categorized into industry (see appendix table A-1) and occupation groups (see appendix table A-2) that are consistent with NCS aggregations used for the ECI.1 The latter is straightforward because the U.S. Bureau of Labor Statistics (BLS) uses a crosswalk classification system to map OPM occupations into the Standard Occupational Classification (SOC) system. The former, in contrast, is more difficult because the OPM data do not contain industry codes. To address this problem, we use the department and agency information in the OPM data and machine-learning tools to match OPM and Quarterly Census of Employment and Wages (QCEW) establishments.2 An algorithm is developed to select a unique North American Industry Classification System (NAICS) code for each agency observed in the OPM data. This final mapping yields a desired agency-to-NAICS crosswalk that we use to calculate Laspeyres, Paasche, and Fisher wage indexes for a variety of aggregations.3

Wage index number formulas

We have many index number formulas to choose from, including the commonly used Laspeyres and Paasche indexes and the less commonly used Dutot or Jevons indexes.4 For exploratory purposes and brevity, we focus on the Laspeyres, Paasche, and Fisher indexes.

Given wages and employment for periods 0 (base period) and 1 (comparison period), the Laspeyres and Paasche wage index number formulas use a fixed “basket” of jobs (employment) to compute the ratio of total wage costs for period 1 to total wage costs for period 0. The Laspeyres index uses the fixed basket to be period-0 employment, whereas the Paasche index uses the fixed basket to be period-1 employment. These formulas are given by

and

where  and  are the Laspeyres and Paasche indexes,  is hourly wage,  is the expenditure share, and i is job 1, 2, ..., n. The expenditure share is given by

where  is employment, i and j are jobs , and t is period 0,1.5 In theory, employers can be expected to substitute away from more expensive workers. Since the Laspeyres index uses a period-0 fixed-employment basket, the Laspeyres index theoretically overstates wage inflation. Conversely, since the Paasche index uses a period-1 fixed-employment basket, the Paasche index theoretically understates wage inflation.

The Fisher wage index is given by the geometric mean of the Laspeyres and Paasche indexes as

Along with the Törnqvist index, the Fisher index is considered to be “superlative,” with a base and comparison period treated symmetrically to better capture labor substitution effects.6

Data

BLS has four quarters of OPM data: first quarter of 2019 and second quarter of 2020, 2021, and 2022. For this analysis, we omit the data from the first quarter of 2019 for two reasons. First, 2019 (first quarter) to 2020 (second quarter) straddled the start of the COVID-19 pandemic, which saw large and uncharacteristic changes in the labor market. Second, 2019 (first quarter) to 2020 (second quarter) was a five-quarter period that included two federal salary increases. The data cover workers employed at the end of each quarter. Note that the data are reported to OPM by human resource offices across the federal government and may be subject to some error. If the federal workforce were incorporated into the ECI, data would need to be collected quarterly from OPM.

OPM data include individual federal employees, annual salary, OPM occupation, full-time or part-time status,7 grade, agency, city, and state. BLS’s OPM data include workers on military bases (which we exclude) but not postal service employees.8 These data do not include any benefit-cost data (e.g., health insurance, retirement, nonproduction bonuses). All salaries are given as annual full-time salaries, so hourly wages are computed by dividing salary by 2,087.9 Missing from OPM data are industry data (NAICS codes), so we use QCEW data and some machine-learning tools to construct an agency-to-NAICS concordance.

Also missing from the OPM data are establishment identifiers. So, we identify them by what we observe: agency, city, and state data, which can be used as imperfect proxies for an establishment. When an agency has just a single establishment within a city, city and state work as a perfect proxy. But if an agency has multiple establishments within a city, city and state are imperfect because multiple establishments are identified as a single establishment.

With an agency-to-NAICS crosswalk and a method for identifying establishments, we then map SOC and NAICS codes into occupation and industry groups (sometimes referred to as pseudo-SOC [PSOC] and pseudo-NAICS [PNAICS]). (See appendix tables A-1 and A-2.) Mean wages and total employment are computed for each basic ECI cell (a grouping by PSOC, PNAICS, and job) or subcell (a grouping by PSOC, PNAICS, subcell category, and job). Summary statistics, including employment counts and percentages of total employment from the OPM, are presented in table 1.

Since this analysis is purely exploratory, we do not attempt to reproduce the method for computing the ECI but instead use its basic conceptual framework for computing wage cost indexes for common index number formula.10 For the ECI, the unit of observation is a quote (such as an establishment, occupation, work status, or grade). These quotes are aggregated into cells consisting of an ownership sector, industry group (PNAICS), and occupation group (PSOC). Cells can be further divided into subcells that may include full- or part-time status, region, division, union status, and so forth.

NAICS codes

Missing from the OPM data are NAICS codes. We construct an agency-to-NAICS crosswalk using QCEW-reported NAICS codes for federal government establishments. The OPM data have standardized, descriptive text for each department and agency. In the QCEW, the department, agency, and NAICS codes are reported individually by each establishment. These reports are subject to variations in establishment practice and can include spelling errors and varying abbreviations. For these reasons, matching the OPM establishments with QCEW establishments is not straightforward.

To construct an agency-to-NAICS crosswalk, we begin by aggregating individual employee data in the OPM data to agency by location. We then match each OPM agency and location with each QCEW establishment by year or quarter, state, and county. For each of these matches, cosine similarities are then calculated for term frequency–inverse document frequency (or TF–IDF) vectorized department descriptions and agency descriptions. This approach essentially amounts to the construction of a cardinal measure of similarity between two vectors. A number of options exist for constructing these vectors for a given match’s descriptions. We have explored bag-of-words unigrams (an unordered list of the individual words from the descriptions) and character n-grams (a contiguous sequence of n characters from a piece of text). We ultimately chose character n-grams because they account for the issue of spelling errors or variations. A key problem with selecting a vectorization strategy is the lack of an objective standard. That is, in the absence of an objective standard, any choice between vectorization strategies possesses some level of arbitrariness.

For a given vectorizer, we use the mean of the cosine similarities for department and agency, weighting by QCEW-reported mean employment and upweighting and downweighting by the relative deviation between employee counts in the OPM establishment-level data and QCEW-reported mean employment. We assume here that larger establishments are more reliable but may also be “punished” for large differences in the reporting of a variable that should be similar. The QCEW department or agency with the best weighted cosine similarity is chosen as the match.

Finally, since each department or agency should uniquely match a NAICS code, we compare the weighted cosine similarity among all establishments for a department or agency and select the NAICS code for the establishment with the best matching weighted cosine similarity. As constructed, the crosswalk is not without flaws, with a mean agency-size weighted score of 0.76 (standard deviation 0.161) and ranging from nearly the worst (0.002) to the best (1.000). The cumulative distribution of cosine similarity scores, weighted by agency size (see chart 1), shows that the bulk of matches are fairly reliable (>0.8), with very few that are clearly unreliable (<0.4). Moreover, the federal government distribution of PNAICS in the OPM dataset roughly matches that for the QCEW data (see chart 2).

For computing exploratory wage indexes, this imperfect crosswalk is sufficient. But to publish indexes using OPM data will require dedicated analyst labor to create a more accurate crosswalk.

Wage index calculations

To compute wage indexes, we first partition the OPM microdata into establishments (department, agency, and city and state) and jobs (occupation, full- or part-time status, and grade).11 Next, we compute average hourly rates and number of employees for each job within an establishment. The establishment-job data are then matched between the second quarter of 2020 and the second quarter of 2021 and between the second quarter of 2021 and the second quarter of 2022. The resulting matched data are partitioned by cell (PNAICS and PSOC) and period. We then calculate weighted average wages and total employment. Finally, we aggregate these data into wage indexes with the use of the Laspeyres, Paasche, and Fisher formulas. To compute subcell wage indexes, we partition the matched establishment-job data by subcell (PNAICS, PSOC, subcell category). Then, we calculate weighted average wages and total employment and aggregate them into subcell wage indexes. Note that for the published ECI, the base period is fixed and all comparisons are relative to the current base quarter (currently the fourth quarter of 2005). In contrast, for each matched pair of OPM datasets (e.g., the first quarter of 2020 to the second quarter of 2021), the base period is the earlier time (e.g., the first quarter of 2020) so that the time series of indexes for each cell and subcell is what is termed “chained.”

Laspeyres, Paasche, and Fisher wage index calculations are shown in tables 1 through 6 for the basic cell aggregation and for a variety of subcell aggregations. We find that our computed rates of inflation are reasonable. Note that the calculations of the Laspeyres, Paasche, and Fisher wage indexes are quite close and, in some instances, equal up to the fourth decimal. This result is similar to other research results.12 This present research also showed that the expected pattern in which the Laspeyres index exceeds the Paasche index is frequently reversed.13 Finally, a comparison of the federal Laspeyres index with the official ECI is given in table 7. Perhaps unsurprisingly, the exploratory federal ECI is more closely aligned with the state and local ECI.

 Table 2. Wage index calculations of basic cell, 2020 second quarter to 2022 second quarter
PeriodLaspeyresPaascheFisher

2020 Q2 to 2021 Q2

1.01311.01311.0131

2021 Q2 to 2022 Q2

1.03421.03411.0341

Note: Q2 = second quarter. Wage index data are aggregated into basic cells consisting of ownership sector, industry group, and occupation group.

Source: Authors’ calculations using data from the Office of Personnel Management.

 Table 3. Wage index calculations of full-time and part-time work schedules, 2020 second quarter to 2022 second quarter
Work schedulePeriodLaspeyresPaascheFisher

Full time

2020 Q2 to 2021 Q21.01301.01291.0129
2021 Q2 to 2022 Q21.03371.03371.0337

Part time

2020 Q2 to 2021 Q21.03661.03611.0363
2021 Q2 to 2022 Q21.04271.04251.0426

Note: Q2 = second quarter.

Source: Authors’ calculations using data from the Office of Personnel Management.

Table 4. Wage index calculations, by Census divisions, 2020 second quarter to 2022 second quarter
Census divisionPeriodLaspeyresPaascheFisher

New England

2020 Q2 to 2021 Q21.01231.01221.0122
2021 Q2 to 2022 Q21.04911.04901.0490

Middle Atlantic

2020 Q2 to 2021 Q21.01441.01441.0144
2021 Q2 to 2022 Q21.03551.03551.0355

East South Central

2020 Q2 to 2021 Q21.01831.01841.0184
2021 Q2 to 2022 Q21.03701.03681.0369

South Atlantic

2020 Q2 to 2021 Q21.00971.00971.0097
2021 Q2 to 2022 Q21.03201.03201.0320

East North Central

2020 Q2 to 2021 Q21.01591.01581.0159
2021 Q2 to 2022 Q21.03411.03411.0341

West North Central

2020 Q2 to 2021 Q21.01521.01521.0152
2021 Q2 to 2022 Q21.03781.03751.0377

West South Central

2020 Q2 to 2021 Q21.01451.01451.0145
2021 Q2 to 2022 Q21.03451.03461.0345

Mountain

2020 Q2 to 2021 Q21.01501.01501.0150
2021 Q2 to 2022 Q21.03981.03951.0397

Pacific

2020 Q2 to 2021 Q21.02071.02061.0206
2021 Q2 to 2022 Q21.04371.04361.0437

Note: Q2 = second quarter.

Source: Authors’ calculations using data from the Office of Personnel Management.

 Table 5. Wage index calculations, by Census region, 2020 second quarter to 2022 second quarter
Census regionPeriodLaspeyresPaascheFisher

Northeast

2020 Q2 to 2021 Q21.01391.01391.0139
2021 Q2 to 2022 Q21.03881.03881.0388

South

2020 Q2 to 2021 Q21.01551.01551.0155
2021 Q2 to 2022 Q21.03611.03601.0360

Midwest

2020 Q2 to 2021 Q21.01111.01111.0111
2021 Q2 to 2022 Q21.03221.03221.0322

West

2020 Q2 to 2021 Q21.01801.01791.0180
2021 Q2 to 2022 Q21.04121.04111.0412

Note: Q2 = second quarter.

Source: Authors’ calculations using data from the Office of Personnel Management.

Table 6. Wage index calculations by size class, 2020 second quarter to 2022 second quarter
Size class by number of employeesPeriodLaspeyresPaascheFisher

1 (<50)

2020 Q2 to 2021 Q21.01891.01891.0189
2021 Q2 to 2022 Q21.04021.04021.0402

2 (51 to 100)

2020 Q2 to 2021 Q21.01171.01161.0116
2021 Q2 to 2022 Q21.04071.04081.0408

3 (101 to 500)

2020 Q2 to 2021 Q21.01041.01041.0104
2021 Q2 to 2022 Q21.03491.03491.0349

4 (>500)

2020 Q2 to 2021 Q21.01331.01321.0132
2021 Q2 to 2022 Q21.03421.03411.0342

Note: Q2 = second quarter.

Source: Authors’ calculations using data from the Office of Personnel Management.

Table 7. Comparison of federal Laspeyres wage index with official Employer Cost Index for wages and salary, 2020 second quarter to 2022 second quarter
PeriodPrivate industryState and localExploratory federal

2020 Q2 to 2021 Q2

1.03561.01621.0131

2021 Q2 to 2022 Q2

1.05671.03231.0342

Note: Q2 = second quarter.

Source: Authors’ calculations using data from the Office of Personnel Management.

Conclusion

This analysis demonstrates the practicality of using OPM data to compute a federal government wage component of the ECI. Other elements of the ECI may also be feasible if benefit-cost and hours data can be acquired. Given the magnitude of the U.S. federal workforce, its inclusion would expand NCS coverage as well as filling a void in information about federal workers. Although the annually announced federal pay increase provides some information about federal employment cost growth, it is an imprecise indicator—actual cost growth depends on the flow of employees into and out of federal service and the mix of employee tenures. The calculation of a wage or employment cost index would provide BLS data users useful measures of the growth of federal employment costs.

Further exploration of OPM data for use with the NCS will be enhanced by access to benefit-cost data. Even though acquiring benefit-cost data might be infeasible, we believe that the construction of federal wage indexes would prove a valuable addition to the NCS. The addition of the federal workforce to the NCS will require an analyst-validated NAICS crosswalk, which we view to be an attainable goal considering the findings presented in this article.

Appendix: North American Industry Classification System codes by industry and Standard Occupational Classification codes by occupation

Table A-1. Government industry group definitions, including codes
PNAICSNAICSIndustry

G000

21, 23, 31 to 33Goods producing

4400

221Utilities

420A

42 to 45Wholesale and retail trade

4300

48, 49Transportation and warehousing

6110

6111Elementary and secondary schools

6112

6112Junior colleges

6113

6113Colleges, universities, and professional schools

61R0

61, excluding 6111 to 6113Rest of educational services

6220

622Hospitals

6230

623Nursing and residential care facilities

62R0

621, 624Rest of health services

9200

92, excluding 928Public administration

81R0

51 to 56, 71 to 81, excluding 814Rest of services

Note: PNAICS = pseudo-North American Industry Classification System, and NAICS = North American Industry Classification System.

Source: U.S. Bureau of Labor Statistics.

Table A-2. Occupation group definitions, including codes
PSOCSOCOccupation

110

11, 13Management, business, and financial

120

15, 17, 19, 21, 23, 25, 27, 29Professional and related

210

41Sales and related

220

43Office and administrative support

300

31 to 39Service

405

45, 47Farm, fishing, forestry, construction, and extraction

430

49Installation, maintenance, and repair

510

51Production

520

53Transportation and material moving

Note: PSOC = pseudo-Standard Occupational Classification, and SOC = Standard Occupational Classification.

Source: U.S. Bureau of Labor Statistics.

Suggested citation:

Travis A. Cyronek and Theodore To, "Federal government wage indexes," Monthly Labor Review, U.S. Bureau of Labor Statistics, April 2023, https://doi.org/10.21916/mlr.2023.9

Notes


1 Each basic Employer Cost Index (ECI) “cell” is categorized into industry and occupation groups. ECI cells are further separated into subcategories or “subcells.” These subcategories include full- or part-time work, Census division or region, establishment size, metropolitan or nonmetropolitan, New York–Chicago–Los Angeles area, union status, and time and incentive status. Our analysis includes only subcells for full- or part-time work, Census division, region, and establishment size.

2 An establishment is defined as an economic unit that produces goods or services, usually at a single physical location, and that is engaged in one or predominantly one type of economic activity. For more information, see U.S. Bureau of Labor Statistics glossary https://www.bls.gov/bls/glossary.htm#E.

3 U.S. Census, “General information about price indexes” (U.S. Census Bureau, n.d.), https:/www.census.gov/construction/cpi/pdf/generalinformationaboutpriceindexes.pdf.

4 For a list of index formulas, see Wikipedia: The Free Encyclopedia, “List of price index formulas,” https://en.wikipedia.org/wiki/List_of_price_index_formulas; and U.S. Census, “General information about price indexes.”

5 Typically, Laspeyres and Paasche index number formulas are expressed as a ratio of total wage costs, given period-0 and period-1 fixed employment baskets,

and

After some manipulation of these formulas, the Laspeyres and Paasche indexes can also be expressed as the function of wage relatives and expenditure shares, as given in the main text.

6 Since the Törnqvist and Fisher indexes are close approximations of one another (formulas produce numbers that are close to one another), we do not use the slightly more complicated Törnqvist index number formula.

7 OPM defines part-time work as between 16 and 32 hours a week and full-time work as more than 32 hours a week. In addition to full-time and part-time work, a number of other work schedules include full-time seasonal, part-time seasonal, intermittent, and intermittent seasonal. Our analysis only includes full-time and part-time workers.

8 We excluded military bases because they can have establishments such as schools, hospitals, entertainment venues, and so forth. Although nurses and teachers might be straightforward to classify into hospitals and schools, occupations such as janitors and secretaries would be challenging. U.S. Postal Service employee data are separately available from OPM and potentially could be included in the future.

9 “Fact sheet: computing hourly rates of pay using the 2,087-hour divisor” (U.S. Office of Personnel Management, n.d.), https://www.opm.gov/policy-data-oversight/pay-leave/pay-administration/fact-sheets/computing-hourly-rates-of-pay-using-the-2087-hour-divisor/.

10 Underlying the ECI is the Laspeyres index number formula.

11 Technically, ECI jobs are also differentiated by union status and time or incentive status. Union status is unavailable in our data, and to our knowledge, incentive pay is not widely used in the federal government.

12 Michael K. Lettau, Mark A. Loewenstein, and Aaron Cushner, “Is the ECI sensitive to the method of aggregation?” Monthly Labor Review, June 1997, https://www.bls.gov/opub/mlr/1997/06/art1full.pdf; and Michael K. Lettau, Mark A. Loewenstein, and Steve P. Paben, “Is the ECI sensitive to the method of aggregation? an update,” Monthly Labor Review, December 2002, https://www.bls.gov/opub/mlr/2002/12/art3full.pdf.

13 Ibid. For an explanation of this pattern reversal, see specifically Lettau et al. “Is the ECI sensitive to the method of aggregation?”

article image
About the Author

Travis A. Cyronek
cyronek.travis@bls.gov

Travis A. Cyronek is a research economist in the Office of Compensation and Working Conditions, U.S. Bureau of Labor Statistics.

Theodore To
to.theodore@bls.gov

Theodore To is a research economist in the Office of Compensation and Working Conditions, U.S. Bureau of Labor Statistics.

close or Esc Key