Handbook of Methods Consumer Price Index Design

Consumer Price Index: Design

This is an archived page. To see the latest version, please visit Consumer Price Index: Design.

The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by consumers for a representative basket of consumer goods and services. The CPI measures inflation as experienced by consumers in their day-to-day living expenses. The CPI is used to adjust income eligibility levels for government assistance, federal tax brackets, federally mandated cost of living increases, private sector wage and salary increases, and consumer and commercial rent escalations. Consequently, the CPI directly affects hundreds of millions of Americans.

The Consumer Price Index (CPI) measures the average price change over time for a market basket of goods and services for two target populations: All Urban Consumers (CPI-U population) and Urban Wage Earners and Clerical Workers (CPI-W population).

Both the CPI-U and the Chained CPI (C-CPI-U) use the CPI-U population. The CPI-U population constitutes about 93 percent of the U.S. population, and covers households in all areas of the United States, specifically, all urban households in core-based statistical areas (CBSAs) and in urban places of 10,000 inhabitants or more.^¹ Not covered are people living in rural nonmetropolitan areas, in farm households, on military installations, in religious communities, and in institutions such as prisons and mental hospitals.

The CPI-W population is a subset of the CPI-U population. The CPI-W consists of all CPI-U population households in which at least one of the members has been employed for 37 weeks or more in an eligible occupation and for which 50 percent or more of the household income must come from wage earnings associated with an eligible occupation. Eligible occupations include clerical workers, sales workers, protective and other service workers, laborers, and construction workers. The CPI-W population excludes households of professional and salaried workers, part-time workers, the self-employed, and the unemployed, along with households with no one in the labor force, such as those of retirees. The CPI-W share of the total U.S. population has diminished over the years and is now about 29 percent of the total U.S. population.

The sample

The CPI collects prices for approximately 80,000 goods and services. Prices are collected each month in 75 urban areas across the country from about 6,000 housing units and approximately 23,000 retail establishments—department stores, supermarkets, hospitals, gas stations, and other types of stores and service establishments. All taxes directly associated with the purchase and use of items are included in the index. Prices of fuels and a few other items are obtained every month in all 75 locations, while prices of most other commodities and services are collected every month in the three largest geographic areas (Chicago, Los Angeles, and New York) and every other month in other areas. Prices of goods and services are obtained primarily through personal visits or telephone calls by BLS data collectors, though some prices are collected directly from websites. Definitions of entry level items are available in appendix 2.

In calculating the index, price changes for the various items in each location are averaged together using weights that represent their importance in the spending of the appropriate population group. Local data are then aggregated to obtain a U.S. city average. For the CPI-U and CPI-W, separate indexes are also published by size of city, by census region, by census division, for cross-classifications of regions and population-size classes, and for 23 local areas. For the C-CPI-U, data are published only at the national level. The CPI-U and CPI-W indexes are considered final when released, but the C-CPI-U index is issued in preliminary form and is subject to three quarterly revisions before the final version.

In order to select stores and items to calculate indexes, BLS implements a sampling design that constructs the sampling frames from which a random sample of stores, consumer items, and expenditure weights can be drawn. This section describes the basic elements of CPI sampling design and the steps taken to implement the design.

Multistage sampling design: areas, outlets, and items

The CPI sample-design process involves multiple stages. In the first stage, a sample of geographic areas is selected. In subsequent stages, BLS selects a sample of outlets in which area residents make retail purchases, a sample of specific retail goods and services that area residents buy, and a sample of residential housing units. The samples are rotated on a regular basis; the geographic sample has traditionally been rotated once after each decennial census.

Area sample

Effective with the 2018 redesign based on the 2010 census, the current geographic sample (appendix 1) was introduced over a multiyear span beginning in 2018. The area definitions are based on the 2013 Office of Management and Budget’s (OMB) CBSAs.^²

Area sampling steps

1. Determine sample classification variables.

In the current sample design, areas were first classified into one of nine census divisions: New England, Middle Atlantic, East North Central, West North Central, South Atlantic, East South Central, West South Central, Mountain, and Pacific. The census divisions represent a further breakdown of census regions. Each area was also classified into one of two population-size classes: self-representing or non-self-representing. Areas above 2.5 million are defined as self-representing; their weight in the CPI corresponds to their population relative to the U.S. population. Areas below 2.5 million represent not only themselves, but other areas in their region and size class.

Picture of the U.S. Map divided like puzzle pieces according to its 4 Census regions and 9 Census divisions

2. Construct primary sampling units.

The current definitions assign counties surrounding an urban core area to geographic entities (the CBSAs). The assignment is based on each county’s degree of economic and social integration (as measured by commuting patterns) to the urban core. There are two types of CBSAs: metropolitan and micropolitan. A metropolitan CBSA has an urban core of more than 50,000 people, and a micropolitan CBSA has an urban core of 10,000 to 50,000 people. Note that CBSAs may cross state borders.

Metropolitan and micropolitan CBSA definitions were used for non-self-representing areas. With the introduction of the CBSA concept to the CPI, the CPI-U covers 93 percent of the U.S. population reflected in the 2010 census. The area sample frame comprises 381 metropolitan CBSAs, representing approximately 85 percent of the population, and 536 micropolitan CBSAs, representing approximately 9 percent of the population.

3. Determine the number of sampled PSUs.

With the 2018 area revision, the CPI program reduced the total number of primary sampling units (PSUs) in the CPI from 87 to 75. These PSUs include 21 areas whose population is greater than 2.5 million and 2 additional areas: Anchorage, AK, and Honolulu, HI. Anchorage represents all CBSAs in Alaska, and Honolulu represents all CBSAs in Hawaii. These CBSAs are unique because the locations of both states make price change in their markets geographically isolated from that in other markets, so the CBSAs in Alaska and Hawaii are treated as separate geographic strata. These 23 self-representing PSUs are combined with 52 non-self-representing PSUs to form 75 total PSUs. For purposes of index calculation, the 75 PSUs are consolidated into 32 index areas. Thus, the current area design yields 7,776 basic indexes (32 index areas by 243 item strata) for the U.S. all-items CPI.

4. Determine stratification variables.

To best create a sample of areas that represent the entire population, the CPI uses a stratified sample. Many demographic stratification variables were considered, and models including different variables were investigated. The final model selected uses four variables: longitude, latitude, median property value, and median household income.

5. Allocate sample and assign PSUs to strata.

The 23 self-representing PSUs account for approximately 39 percent of the total U.S. population and about 42 percent of the CPI-U population. There are 52 non-self-representing PSUs, which represent the remaining 58 percent of the CPI-U population and include both metropolitan and micropolitan areas.

The next phase of the selection process was to assign the non-self-representing PSUs within each census division to strata based on a model of the four stratification variables. The primary objective of the PSU stratification was to minimize the between-PSU component of variance by making the PSUs within each stratum as homogeneous as possible with respect to the four stratification variables. In addition, to further minimize the variance, strata within each census division had to be kept with approximately the same population.

6. Select a sample of PSUs.

The final step of the selection process was to select one PSU per stratum. Before making that final selection, we chose to employ a sample-overlap procedure, which is intended to increase the expected number of non-self-representing areas reselected in the new design. Additionally, we implemented a controlled-selection procedure, which aims to conduct a random sample in a way that increases the probability of selecting certain preferred combinations of PSUs. After adjusting the sample selection probabilities with the use of the Ernst sample-overlap procedure and employing controlled selection for the micropolitan areas, we randomly selected one PSU per stratum.^³

7. New area design implementation plan.

After selecting the final area design, BLS determined the process for implementing the new geographic sample into the three surveys used to construct the CPI. The surveys are the Consumer Expenditure (CE) survey, the Commodity and Services survey, and the Housing Survey.

For the 2018 area revision, the CE fully converted to the new sample in 2015. However, for the other surveys, which are directly managed by the CPI program, the 21 new PSUs have been divided into groups and the new PSUs were introduced over a multiple-year span. This rotation process distributed the cost of introducing new PSUs into the Housing and Commodities and Services surveys, avoiding a spike in data collection costs before the full conversion to the new area design.

The calculation of price indexes under the new area design began in January 2018, with the introduction of the first set of new PSUs in the sample. Existing PSUs scheduled to be rotated out of the sample later in the implementation process will be used as proxy candidates for new PSUs rotating in late, until the complete set of new PSUs has been rotated into the sample. An ideal proxy for a new PSU was considered to be one of the PSUs to be dropped from the sample within a new PSU geographic stratum. If PSUs that had been dropped were not available, a proxy was identified through nearest-neighbor rules, with the constraint that the proxy falls within 200 miles of the new PSU. If no eligible proxy existed, the new PSU was considered to be a “geographic hole” within the new area structure. There were eight new PSUs with no eligible proxy that were given priority in the rotation schedule.

Outlet sample

The outlet sample for most items in the CPI is developed with data from the CE Survey. The survey furnishes data on retail outlets from which metropolitan and micropolitan households purchased well-defined groups of commodities and services to be priced in the CPI.

Commodities and services are grouped into sampling categories based on entry-level items entry-level items as defined in the CPI classification structure. Some categories consist of only one ELI, while others consist of more. Entry-level items are combined into a single category when the commodities or services generally are sold in the same outlets; for example, boys' outerwear and boys' shirts and sweaters are both in the same category.

Additional information is available in the CE Survey’s section of the BLS Handbook of Methods.

Procedures for selecting items within outlets

Each outlet is assigned a number of entry-level items for price collection. A data collector visits each selected outlet and uses a multistage probability selection technique to select specific items from among all the items the outlet sells that fall within the entry-level item definitions. Additional information on categories and entry-level item titles is available in the entry-level item definitions spreadsheet (appendix 2, ELI list), the CE categories spreadsheet (appendix 3, CE categories), the non-CE categories spreadsheet (appendix 4, NonCE categories), and the CE CPI concordance spreadsheet (appendix 5, CE CPI concordance).

Data collectors first identify all of the items included in an entry-level item definition and all of the items that are offered for sale by the outlet. When there are a large number of items in the entry-level items, the data collector groups them by common characteristics, such as brand, size, or type of packaging. With the assistance of the respondent for the outlet, the data collector assigns probabilities of selection to each group.

The probabilities of selection are proportional to the sales of the items included in each group. The data collector may use any of the following procedures to determine the proportion of sales:

Percents: The percent that a specific group represents of the total dollar sales of all the groups listed in a specific stage of disaggregation
Ranks: An ordering by the respondent of the groups from largest to smallest in terms of dollar sales
Dollar Volume Seller: Method used for the respondent to identify the largest dollar volume selling unique item from the previous group selected in disaggregation
Ranked Selling Space: eligible items are ranked by the amount of space they occupy within the store
Equal Probability: Assigns percentages to each of the groups listed in a disaggregation step solely on the basis of the number of groups

After assigning probabilities of selection, data collectors use a procedure to randomly select one group. They then identify all items included in the selected group, form groups of those items based on the characteristics they have in common, assign probabilities to each group, and randomly select one. Data collectors repeat this process through successive stages until reaching a unique item and describe the selected unique item on a checklist for the entry-level item. Checklists contain the descriptive characteristics necessary to identify the item among all items defined within the entry-level item.

These selection procedures ensure that there is an objective and efficient probability sampling of CPI items other than shelter. They also allow broad definitions of entry-level items, so that the same unique item need not be priced everywhere. The wide variety of specific items greatly reduces the within-item component of variance, reduces the correlation of price movement between areas, and allows a substantial reduction in the number of quotes required to achieve a given variance. Another important benefit from the broader entry-level items is a significantly higher likelihood of finding a priceable item within the definition of the entry-level item in the sample outlet.

The selection process is completed during the visit to the outlet to obtain the price for the selected item. Subsequently, personal visits, telephone calls, or website visits are made, either monthly or bimonthly, to make sure that the item is still sold and to obtain its current price.

Shelter

The CPI Housing Survey provides the data needed to measure price change for the two housing component indexes: owners’ equivalent rent of primary residence (OER) and rent of primary residence (Rent). The Housing Survey follows the rents of a sample of renter-occupied housing units selected to represent both renter- and owner-occupied housing units in the urban United States.

We continuously update the sample of rented housing units by replacing one-sixth of the rented housing unit sample every year on the basis of the latest available U.S. Census Bureau data.

Collecting a large sample less frequently is more efficient for the Housing Survey because rent prices are not as volatile as most other consumer prices. This efficiency is accomplished by assigning each selected neighborhood (called a segment) in a pricing area to 1 of 6 panels, each of which represents a subsample of each pricing area and provides sufficient information for the monthly owners’ equivalent rent of primary residence and rent indexes. Each month, a panel is priced, with all six panels priced twice a year: panel 1 is collected in January and July, and panel 2 is collected in February and August, and so on. Every month, we collect rent prices and other information for one panel and the 6-month price ratio is computed (the current rent divided by the rent 6 months ago) for each unit in the panel. The measures of price change for the two housing components are based on weighted averages of these rent ratios.

The 2018 geographic sample of the CPI partitioned the urban United States into 32 CPI areas and selected 75 pricing areas. These areas were metropolitan and micropolitan areas and were selected using probability proportional to size (PPS) sampling, which used the size of the 2010 population. For the Housing Survey, CPI pricing areas are further partitioned into neighborhoods called segments, formed from one (in most cases) or more U.S. Census Bureau block groups and containing at least 50 housing units in large (A-size) self-representing pricing areas and at least 30 in smaller non-self-representing pricing areas. With the use of PPS, a sample of segments was selected in each area, in which the size measure was the sum of renters’ actual rents and owners’ estimated implicit rents. The Census Bureau provides the number of renters, the average rents, and the number of owners by block group, whereas BLS estimates the average owners’ implicit rents. An average of about five rental housing units is selected within each segment.

Housing sample

In 2010, the CPI undertook a three-stage effort to improve the Housing Survey. The first and second stages used the 2000 census. The first stage was a 4-year sample augmentation with a goal of adding 16,000 units, mainly in neighborhoods with seriously depleted renter samples, and increasing the size of the sample to its target. The CPI began using data from this augmentation in the owners’ equivalent rent of primary residence and rent indexes for July 2010.

The second stage was a sample replacement meant to replace the rental units introduced in 1999. The November 2012 CPI was the first that used a new sample from this stage and the May 2016 CPI was the first in which the Housing Survey sample was drawn entirely from the 2000 census.

The final stage was a regular replacement commencing in 2016 and ending in 2022. It replaced the 2000 census-based sample with one based on the American Community Survey using 2010 Decennial census geography. This stage will continue into the future and—for the first time—the CPI Housing Survey will have a process that keeps its sample continuously updated.

BLS staff use purchased address lists and a mail prescreening survey to locate housing units in the segments. The lists indicate the probability that an address is owner-occupied and the addresses provide a means of determining whether an address is a commercial establishment. This information issued to determine sampling rates for the mail prescreening survey and to determine if selected addresses are commercial or residential and, if they are residential, their tenure (owner- or renter-occupied). Only those addresses the survey identifies as renter-occupied and those with no response are sent out for data collectors to screen.

Staff must find an eligible respondent for each address. During screening, the interviewing software directs the data collectors through a structured series of questions to verify that the unit is renter-occupied. Data collectors further determine the following:

the unit is the primary residence of the occupant
the occupant is not a relative of the landlord
the unit is not institutional
the unit is not public housing
the unit is not an assisted-living facility with activities of daily living provided to an occupant
These questions help the data collectors determine if a selected address is eligible for the Housing Survey sample

The data collector has a multi-month period to screen and initiate the units in a segment; units that they are not able to screen (usually because the field agents fail to contact a respondent) go back out “on panel” for another screening attempt. This process should yield an expected number (usually five) of in-scope housing units in each segment that will be initiated into the Housing Survey sample.

Initiation and pricing

Once a selected address has been successfully screened, the data collector immediately proceeds to initiate the housing unit into the Housing Survey sample. Initiation is the initial collection of rent data, which the data collector obtains by asking another structured series of questions. These data include the rent that is paid and specific housing services that are associated with the unit and are the basis for all calculations of rent change that will occur during the life of the housing unit in the housing sample.

Once a unit is initiated, it will be priced on panel every 6 months. In addition, any in-scope units that are not successfully initiated go back on panel for another attempted initiation. A housing unit’s initiation generally does not occur in its on-panel month, so the housing unit must be priced on panel for 2 cycles to provide a 6-month interval before price changes can be used in the indexes. During initiation and during each pricing, BLS collects

contract rent and rental period (monthly, bimonthly, weekly, or for a specified number of days);
utilities, facilities, and any other such items included in the rent;
any subsidies (e.g., Section 8) or reductions in the rent in exchange for services the tenant provided;
any extra charges included in the contract rent for optional items, such as parking;
the number of rooms, type of housing structure, and other physical characteristics; and
equipment used for air conditioning and fuels used for heat and hot water.

In addition, to ensure that the unit remains in scope, we ask the screening questions every 2 years or when a change of occupant occurs.

Notes

¹ Core-based statistical areas (CBSAs) consist of the county or counties or equivalent entities associated with at least one core (urbanized area or urban cluster) of at least 10,000 population, plus adjacent counties having a high degree of social and economic integration with the core as measured through commuting ties with the counties associated with the core.

² Revised delineations of metropolitan statistical areas, micropolitan statistical areas, and combined statistical areas, and guidance on uses of the delineations of these areas, Bulletin No. 13-01 (Office of Management and Budget, February 28, 2013), https://www.whitehouse.gov/sites/default/files/omb/bulletins/2013/b-13-01.pdf.

³ See https://www.bls.gov/ore/pdf/st990060.pdf

Last Modified Date: February 21, 2023