Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Article
April 2025

Impacts of COVID-19 on collection and missing data in the CPI

This article presents novel measures of the impact of the COVID-19 pandemic on the Consumer Price Index. One measure estimates the difference between prices collected online and prices collected in person. Another measure results from an alternative imputation method used when no sample is present for a component. A new metric for sample completeness illustrates the deterioration in the sample size more comprehensively than existing metrics. Lastly, this article presents the disparate impact of the pandemic on goods compared with services, in terms of data availability.

The COVID-19 pandemic presented unique challenges to the U.S. Bureau of Labor Statistics (BLS) in compiling the Consumer Price Index (CPI).1 In March 2020, in-person data collection ceased almost overnight at brick-and-mortar stores and switched to online collection, increasing the rate of survey nonresponse and missing data in the CPI surveys. To monitor the impacts of these issues on the CPI, BLS developed and began releasing new metrics each month.2 This article takes a closer look at some additional measures of the impact of the COVID-19 pandemic on the CPI’s Commodity and Services (C&S) survey.3 First, I examine price changes associated with changing the mode of collection from in-person to online. Next, I evaluate an alternative imputation method for when no sample is present in an elementary item-area. Then, I show a measure of data completeness as an extension to the existing measures that are used to determine index eligibility for publication. Finally, I offer the rate at which products were missing from store shelves and websites, an indication of pandemic-related shortages.

The effects of collection modes on prices and data quality

When BLS selects the set of outlets (stores) from which it will track prices, BLS assigns each outlet with either a specific physical location or internet address.4 Each product or service that is observed and priced over time for use in the CPI is called a “quote.” Prior to the pandemic, BLS data collectors gathered most prices by visiting a physical store location. In March and April 2020, most data collection abruptly moved to websites (as shown in chart 1) and only in the second half of 2022 began to return to physical locations in earnest. Chart 2 shows how many quotes switched collection mode in each month.

During March and April 2020, price changes in outlets in which the mode of collection changed from personal visits to online collection were reflected in the CPI.5 The price change resulting from switching to online collection varied by item category and outlet and cannot be separated from the price change that would have been observed had data collection continued in the physical locations. Price changes caused by the change in collection mode could result from two cases: the same exact product being sold at a different price online than in physical stores; or if the exact product was unavailable and a replacement product was selected and the price change (possibly adjusted for quality differences) between the two was shown in the price index.

To estimate the impact of the switch to online collection, I estimate research indexes to compare the price change for quotes in which the mode of collection changed with those that did not change.6 To do so, I classify each observation used in the calculation of the CPI by its collection mode during the current month and the previous month. For example, a price could have been collected by personal visit during February 2020 and then switched over to collection via the store’s website in March 2020. The two collection patterns of interest are prices that were collected the same way in both periods and prices that were collected by personal visit in the earlier period and from a website during the later period.7 I exclude quotes that were off cycle, previously deleted, or carried forward.8 Within each combination of item-area and collection pattern, I compute an unweighted geometric average. To aggregate across areas, I take an unweighted arithmetic average over all the item-areas for each pattern. I perform no imputation; if an outlet did not change status in the manner described (collected by personal visit in both periods or collected by personal visit in the earlier period and from a website during the later period), I remove it from the calculations. Chart 3 presents the research price index for quotes that switch from in-person to online collection versus those that were collected online in both periods.

Generally, steady or declining prices are observed with the change in collection method from in-person to online collection. However, pricing food online rather than in person was associated with higher prices. The results are presented in table 1.

Table 1. Change in collection method for the CPI, by major group, March and April 2020
Major group and collection typeMarch 2020April 2020
MeanMedianQuotesMeanMedianQuotes

Apparel

Personal visit in both periods

1.80.02,490[1][1]7

Personal visit in the first period, online in the second period

-5.80.01,216-10.50.02,810

Online in both periods

-1.40.02,549-7.70.02,932

Education

Personal visit in both periods

0.70.0236[1][1]2

Personal visit in the first period, online in the second period

-0.60.0173-1.10.0254

Online in both periods

-0.30.03,0700.00.02,978

Food

Personal visit in both periods

0.00.016,803[1][1]36

Personal visit in the first period, online in the second period

2.10.05,8433.20.010,636

Online in both periods

0.10.03301.20.06,210

Other

Personal visit in both periods

-0.10.01,007[1][1]5

Personal visit in the first period, online in the second period

-1.00.0315-0.20.0712

Online in both periods

1.50.0244-0.90.0323

Housing, excluding rent and owners’ equivalent rent

Personal visit in both periods

0.50.01,415[1][1]8

Personal visit in the first period, online in the second period

-1.60.05370.50.01,419

Online in both periods

-1.80.01,740-2.00.01,860

Medical

Personal visit in both periods

1.80.0648[1][1]4

Personal visit in the first period, online in the second period

-2.20.0148-1.30.0380

Online in both periods

-1.00.059-0.20.098

Recreation

Personal visit in both periods

-0.50.01,223[1][1]3

Personal visit in the first period, online in the second period

-1.30.0628-2.00.01,295

Online in both periods

0.10.02,299-0.30.02,273

Transportation

Personal visit in both periods

-2.7-1.73,131[1][1]12

Personal visit in the first period, online in the second period

-10.9-9.5467-13.6-12.91,490

Online in both periods

-11.50.02,176-13.9-2.52,255

[1] Suppressed to avoid identifying any survey respondents.

Source: U.S. Bureau of Labor Statistics.

As personal-visit collection resumed, BLS used the difference between the prepandemic price and the price after in-person visits resumed to calculate the price change over that period.9 In the preceding analysis, I do not attempt to compute the price change associated with the return to in-person data collection for several reasons. First, the shift from in-person to online data collection occurred over 2 months, whereas the return to data collection in stores is occurring over many months. Therefore, in any given month, relatively few quotes are changing collection modes, and this transition is still ongoing, as shown in chart 1. Second, it would also be difficult to separate the effects of the inflationary trend in late 2021 and 2022 from any price change associated with changing the data collection from online to in-person. Third, routine sample rotation and product replacements would cause a relatively small number of cases in which the same unique product at the same physical location of the same store could be observed in early 2020 and again in late 2022.

Missing data

The CPI C&S survey has imputation procedures to estimate price change when a price is unavailable. Although the CPI surveys experienced a reduction in the number of collected prices during the COVID-19 pandemic, BLS did not change the CPI imputation procedures.10

CPI background and traditional missing-data process

The CPI is structured into a set of basic (or elementary) item categories, like bread, and basic (or elementary) geographic areas, for example Philadelphia. There are 243 item categories and 32 areas yielding 7,776 basic item-area cells, e.g., bread in Philadelphia. When no prices are collected for an entire basic cell, the price change for the cell needs to be imputed. Although the need for this imputation is relatively rare, it does occur in item categories in which gaining cooperation from the survey respondents is difficult or unusually burdensome. In recent years, the frequency of missing cells increased (as shown in table 2). The practice BLS uses in the CPI is to impute the price change for the missing basic item-area cell from the price change of the same basic item in a nearby basic area. For example, the basic cell for bread in Philadelphia would be imputed from the basic cell for bread for smaller cities in the mid-Atlantic. An alternative approach would be to impute the price change by first computing the item category price change at the national level excluding any missing areas. Then one could impute the missing cell with the value of the category at the national level and recompute the national estimate based on all the areas.11 A potential downside is that this would dampen variance because missing price changes would be replaced by the average price change. Although the impact of nationally imputed values is generally small, nationally imputing prices can have a larger impact in item categories with small samples or respondent burden issues. For example, the impact on the all-items U.S. city-average level is negligible, but domestic services has a small sample, so the impact of the alternate imputation approach is more noticeable, as shown in charts 4 and 5.

The biggest difference when one imputes the missing cells with national data is in food at employee sites and schools. However, during the COVID-19 pandemic many school lunches were free, which introduced unusually large price changes to this item category. Even with these extreme price changes in food at employee sites and schools, the difference between the official index and the nationally imputed index is around 1.4 percentage points per year (see chart 6).

Differences in index values are one way to look at alternative imputation methods. Another possibility is to count how many cells are imputed, as shown in table 2. In the months that had the most imputations, the number of item-areas without sample increased from 2020 through mid-2023. The pandemic forced some delays to the sample rotation process in which BLS selects a new set of stores from which to obtain prices. Generally, approximately one-eighth of the sample rotates every 6 months. Table 2 shows the research index 1-month percent change minus the official index 1-month percent change for each basic cell.12 I then take the median difference over the basic item-area cells with missing values in each month. The differences tend to be small (see table 3). The increase in missing item-areas from 2020 through 2023 can be at least partially explained by operational obstacles. Gaining respondent cooperation and initiating products is more challenging without a personal visit and can often cause delays.

Table 2. Imputed cells in the CPI and the difference between official and alternative imputation techniques, January 2022–July 2023
PeriodImputed cellsMedian difference between the percent changes in the experimental and production indexes

January 2020

1860.14

February 2020

1840.03

March 2020

1800.07

April 2020

1790.31

May 2020

1880.11

June 2020

2020.07

July 2020

201-0.03

August 2020

1840.07

September 202

1840.18

October 2020

1840.13

November 2020

1850.00

December 2020

1850.08

January 2021

1840.09

February 2021

1830.22

March 2021

1820.17

April 2021

1810.05

May 2021

2420.06

June 2021

2600.00

July 2021

2510.02

August 2021

2500.06

September 2021

2390.44

October 2021

2640.11

November 2021

2930.00

December 2021

2930.07

January 2022

2930.00

February 2022

2910.22

March 2022

2910.19

April 2022

2840.29

May 2022

3260.23

June 2022

3660.26

July 2022

3310.10

August 2022

3210.05

September 2022

3140.01

October 2022

3380.07

November 2022

3930.06

December 2022

3830.28

January 2023

3720.00

February 2023

3640.27

March 2023

3450.20

April 20223

3390.08

May 2023

334-0.01

June 2023

3280.01

July 2023

3270.22

Note: CPI = Consumer Price Index.

Source: U.S. Bureau of Labor Statistics.

Table 3. Missing basic cells in the CPI and their relative importance
Category[1]Percentage of basic cells missing from January 2020 to July 2023 (percent)  Relative importance, U.S. city-average, December 2019, CPI-U

Repair of household items

530.117

Gardening and lawncare services

390.303

Leased cars and trucks

300.646

Legal services

260.250

Domestic services

240.291

[1] Fuel oil is among the top categories at 49.9 percent over the study period, but it is excluded from this table. Missing cells for fuel oil antedate the pandemic; because heating oil is not used in warm areas of the United States, it is therefore not sampled in the Consumer Price Index.

Note: CPI = Consumer Price Index. CPI-U = Consumer Price Index for All Urban Consumers.

Source: U.S. Bureau of Labor Statistics.

Density, adequacy, and the impact on publication

Whether BLS publishes an index depends on the “adequacy” of the index. A cell is adequate if it has at least one collected price and inadequate if it has no collected prices. Adequacy for an aggregate index is a weighted arithmetic average of its constituent basic cells’ adequacies, where the weights are the expenditure shares of each basic item-area cell to the aggregate item-area. All adequacy values are between 0.0 and 1.0, and for an aggregate index to be published, its weighted adequacy must exceed 0.5.13

Even with the difficulty in collecting data during the pandemic, BLS has maintained publication at the U.S. city-average level. From January 2020 through July 2023, there was a small drop in the number of published not seasonally adjusted indexes at the U.S. city-average level (see chart 7).

A density approach to publication criteria

There are several approaches for handling heavily imputed indexes.14 The approach used by Eurostat and a similar one used by the United Kingdom’s Office of National Statistics (ONS) depends on the share of data successfully collected. The ONS imputes an index if more than 20 percent of the data are missing, and both agencies flag indexes in which more than 50 percent of the weight of the index is imputed. In both cases, the indexes are published, unlike in the United States, where indexes that fail to reach an adequacy of 0.5 are withheld from publication. The dramatic effect of COVID-19 on BLS ability to collect data presents an opportunity to see what a share-of-data approach to publication might have looked like in the CPI. For this analysis, I will call this approach the “density” approach to publication. BLS has no current plans to change to using a density measure as a publication criterion and, outside of the pandemic period, these two approaches yield similar publication results.

Index density is a weighted percentage of quotes that were successfully collected. This measure has a stronger connection to the amount of data collected than a traditional adequacy measure. For example, suppose there are 100 quotes in a basic item category-area. The basic cell adequacy is the same whether 1 or 100 quotes are collected, but the density measure will increase as the sample moves from 1 quote to 100 quotes. The basic-cell densities can then be aggregated in the same manner as that for adequacy.

Results for density

As shown in chart 8, under both criteria (density and adequacy) the all-items index would meet publication criteria.15 The new information here is that the density graph shows more clearly the reduction in collected prices at the beginning of the pandemic, followed by a partial recovery.

One can also look at density for each major group. As chart 9 shows, density drops in the food and beverages, other goods and services, and recreation major groups, but the education and communication major group was steady over the period studied.

Next, I compare density and adequacy for some specific categories that experienced notable decreases in adequacy during the pandemic. One of the more high-profile examples is leased vehicles (see chart 10). In leased vehicles, both the adequacy and density drop. In this case, the density measure is not providing much information about data quality that was not already present in the traditional adequacy statistic. Generally, there is a strong correlation between the existing adequacy measure and this new density measure. At the U.S. city-average level for each basic item category from January 2020 to July 2023, the correlation coefficient is 0.87.

Even with a high correlation between density and adequacy, there are a handful of food series in which there was little deterioration in adequacy but density dropped noticeably. Chart 11 shows the indexes for one such series, ham.

Some series became unpublishable on a recurring basis after the start of the pandemic. For these series, the value of density relative to adequacy is unclear. There are three basic CPI item categories that were inadequate at the national level for more than 3 months from March 2020 to July 2023: gardening and lawncare (for 25 months), repair of household items (for 23 months), and leased vehicles (for 25 months). The first two categories were inadequate for 2 months in the 12 months prior to March 2020, while prior to March 2020, leased vehicles was always adequate. For these basic item categories, the adequacy and density are similar, and again it does not seem that density is providing much new information (see chart 12).

Possible impact of density on publication

Density and adequacy often seem to track closely. One could, however, look at how publication rates would change if there were a rule that the density level must be at least 50 percent for publication at the national level.16 If one were to use a 50-percent threshold for density, how would that compare with the current system? Looking at the published indexes at the U.S. city-average level from January 2020 to July 2023, one finds 948 item-months at the national level that were published but would not have been published at a 50-percent threshold for density. With 6,586 item-months at the national level in the dataset, this would reduce the amount of published series by 14 percent.

Table 4. Number of indexes that have more than 50 percent density but less than 50 percent adequacy, January 2020–July 2023
ItemNumber of indexes

Medical care

25

Apparel

5

Recreation

3

Other goods and services

2

New vehicles

37

Women's dresses

36

Hospital services

36

Motor vehicle repair

36

Women's outerwear

35

Food from vending machines and mobile vendors

35

Domestic services

34

Admissions

34

Fees for lessons or instructions

34

Motor vehicle body work

34

Care of invalids and elderly at home

33

Photographers and photo processing

32

Sports vehicles including bicycles

29

Legal services

28

Food at employee sites and schools

24

Physicians' services

24

Services by other medical professionals

23

Women's suits and separates

18

Other furniture

18

Indoor plants and flowers

18

Gardening and lawncare services

17

Repair of household items

17

Clocks, lamps, and decorator items

16

Leased cars and trucks

16

Other food away from home

14

Motor vehicle maintenance and servicing

14

Computers, peripherals, and smart home assistants

13

Living room, kitchen, and dining room furniture

12

Televisions

12

Wine at home

11

Dishes and flatware

11

Cigarettes

10

Bedroom furniture

10

Men's shirts and sweaters

9

Household paper products

9

Toys

8

Ham

7

Other intercity transportation

7

Other uncooked poultry including turkey

6

Beer, ale, and other malt beverages at home

6

Men's suits, sport coats, and outerwear

5

Women's underwear, nightwear, swimwear, and accessories

5

Financial services

5

Window coverings

5

Nursing homes and adult day services

5

Sports equipment

5

Club membership for shopping clubs, fraternal, or other organizations, or participant sports fees

5

Boys' and girls' footwear

4

Haircuts and other personal care services

4

Apparel services other than laundry and dry cleaning

4

Moving, storage, freight expense

4

Sewing machines, fabric and supplies

4

Outdoor equipment and supplies

3

Car and truck rental

3

Men's underwear, nightwear, swimwear, and accessories

2

Women's footwear

2

Uncooked other beef and veal

2

Other pork including roasts, steaks, and ribs

2

Fresh fish and seafood

2

Full service meals and snacks

2

Tobacco products other than cigarettes

2

Dental services

2

Eyeglasses and eye care

2

Pet services including veterinary

2

Daycare and preschool

1

Rice, pasta, cornmeal

1

Other fresh fruits

1

Potatoes

1

Tomatoes

1

Floor coverings

1

Other appliances

1

Nonelectric cookware and tableware

1

Photographic equipment and supplies

1

Source: U.S. Bureau of Labor Statistics.

Temporary stock-outs

Past research using BLS data from the Commodities and Services (C&S) survey has looked at rates of missing prices to estimate how often stores are out of particular products.17 Taking a cue from this work, I attempt to see how often a quote that was collected as “temporarily unavailable” later becomes available again. Temporary stock-out in the C&S data would be an indication of how often stores are missing products that they typically have for sale.18 This missingness could be caused by supply chain, transportation, and labor issues during the pandemic.19

For each month from January 2020 to July 2023, I identify quotes that were unavailable. This includes quotes in outlets in which the entire outlet was unavailable (for example, the outlet was temporarily closed or had no online presence). Then, for each unavailable quote, I examine the data for the next 6 months to see what eventually happened to the quote.20 Chart 13 shows rates of temporarily unavailable quotes returning, quotes being replaced, and those disappearing for some other reason, for all quotes.

Charts 14 and 15 show the rates of temporarily unavailable quotes (and their eventual outcomes) for goods and services, respectively. The impacts of the COVID-19 pandemic and the switch to online collection on goods availability was generally greater than that on services. 

Some of these categories, like groceries and paper products, were known to have supply chain issues at the start of the pandemic.21 The rate at which quotes were unavailable has recovered fully, but only relatively recently, as shown by the dotted lines in charts 16 and 17.

Conclusion

This article explores four topics about the impact of COVID-19 on the CPI operations. First, I construct a price index that examines the impact of changing to web collection from the traditional brick-and-mortar collection. In certain categories, there is a price difference observed associated with the change from in-person to online price collection. Next, I carry out some alternative imputation of missing cells. There is an impact from this change in some categories with particularly acute nonresponse issues. Next, I compute a measure of data completeness that more fully considers what is occurring in terms of collected prices at the lowest level. Here, for certain categories, the new statistics show some aspects of data deterioration that were not immediately clear with existing metrics. Lastly, I measure the rate at which products are missing from store shelves and later returned. These new metrics show that the data collection disruptions caused by the COVID-19 pandemic were substantial, but in many cases the new metrics have returned to their prepandemic levels.

Suggested citation:

Mark Bowman, "Impacts of COVID-19 on collection and missing data in the CPI," Monthly Labor Review, U.S. Bureau of Labor Statistics, April 2025, https://doi.org/10.21916/mlr.2025.8

Notes


1 For the purpose of this analysis, the pandemic began in March 2020 when many travel restriction were put in place. For most people, COVID-19's impact on day-to-day life has slowly receded since. The World Health Organization ended its declaration of a global emergency in May 2023. For more information, see “Coronavirus disease (COVID-19) pandemic” (World Health Organization, accessed January 2025),  https://www.who.int/europe/emergencies/situations/covid-19.

2 “Effects of COVID-19 pandemic and response on the Consumer Price Index,” Consumer Price Index COVID-19 Impact Summaries (U.S. Bureau of Labor Statistics, January 30, 2023), https://www.bls.gov/covid19/effects-of-covid-19-pandemic-on-consumer-price-index.htm.

3 The Consumer Price Index (CPI) comprises two surveys, the Commodity and Services (C&S) survey and the Housing survey.

4 For more detailed information, see Handbook of Methods (U.S. Bureau of Labor Statistics, last modified January 30, 2025), https://www.bls.gov/opub/hom/cpi/home.htm.

5 Online collection of observations that were previously collected by personal visit would exclude shipping, delivery, or pickup fees.

6 These indexes, calculated with an unweighted geometric average, have no way to differentiate the causes of the price change. However, such a method as used in this article seems reasonable to try to get some sort of overall measure, at the risk of oversimplifying the way the U.S. Bureau of Labor Statistics (BLS) calculates the CPI.

7 The same exact observations do not make up each research series. A price could switch from personal visit to web and its price change would be shown in the research series, but that price would never be included in future months, so this research price index doesn’t have the desirable property of returning to its previous level if all underlying prices returned to their previous levels.

8 Previously deleted quotes are those for which data collection is no longer being attempted because the quotes were deleted. Off-cycle quotes are quotes for which data collection is not being attempted in a month because data collection only occurs every other month. Carried forward quotes are a set of quotes in which data collection is not being attempted in a month because the prices are assumed to be unchanged (for example, college tuition is collected at the beginning of the semester or quarter).

9 This method is consistent with the international advice of producing consumer price indexes under lockdown. See section 2.65 in “Recommencing regular price collection after lockdown,” in Guide on Producing CPI Under Lockdown, by the United Nations Economic Commission for Europe (United Nations, 2021), pp.16–18, https://unece.org/sites/default/files/2021-12/Guide%20on%20producing%20CPI%20under%20lockdown%20WEB%20version.pdf

10 This impact on quote counts has been documented on the BLS COVID-19 impact page. See "Impact of the coronavirus (COVID-19) pandemic on Consumer Price Index data for October 2021” (U.S. Bureau of Labor Statistics, last modified November 22, 2021), https://www.bls.gov/covid19/consumer-price-index-covid19-impacts-october-2021.htm.

11 Recomputing the national estimate on the basis of all the areas has the advantage of not allowing imputed data to impact aggregate estimates (with respect to complete-case analysis) and is more in line with international practices. See section 3.20 in “Imputation methods,” in Guide on Producing CPI Under Lockdown, by the United Nations Economic Commission for Europe, pp. 23–30.

12 Though not always possible, adding and subtracting percentages is possible under certain circumstances. For additional information on when and why this can be appropriate, see Haipeng Chen and Akshay R. Roa, “When two plus two is not equal to four: errors in processing multiple percentage changes,” Journal of Consumer Research vol. 34, no. 3, October 1, 2007, pp. 327–340, https://doi.org/10.1086/518531.

13 Although adequacy values are always between 0.0 and 1.0, every basic cell that makes up the adequacy value is either a zero or a one.

14 See section 5.3, in “Communication with users and stakeholders,” in Guide on Producing CPI Under Lockdown, by the United Nations Economic Commission for Europe, p. 41.

15 That is, if one were to assume that the publication threshold should be set at 50 percent for density.

16 This 50-percent threshold is the same one as adequacy but given how adequacy and density are defined it is impossible for density to exceed adequacy. If the CPI were to incorporate a density formula to determine publication status, a thorough study of the relationship between density and the standard errors should be performed first.

17 See Mark Bils, “Deducing markups from stockout behavior,” Research in Economics vol. 70, no. 2, June 2016, pp. 320–331; and David A. Matsa, “Competition and product quality in the supermarket industry,” The Quarterly Journal of Economics vol. 126, no. 3, August 2011, pp. 1539–1591.

18 This is an imperfect measure. Because the purpose of the C&S survey is to collect price data that supports inflation calculations and not rates of stock-outs, a survey to measure inventories would be designed differently.  

19 Of course, the purpose of the C&S survey is to collect price data that supports inflation calculations and not rates of stock-outs, and a survey to measure inventories would be designed differently.

20 I categorized the quotes as follows: quotes where the same product returned, quotes in which the unique product was substituted (replaced) with another unique product and a third, catch-all category that includes quotes that were deleted, simply rotated out of the sample, or remained unavailable. I selected these categories because they are mutually exclusive and collectively exhaustive.

21 Charts 13 to 17 are truncated to April 2023 because it is unknown what will happen to quotes that were unavailable in months that were recent when this analysis was completed in August 2023.

article image
About the Author

Mark Bowman
bowman.mark@bls.gov

Mark Bowman is an economist in the Office of Prices and Living Conditions.

close or Esc Key