Bureau of Labor Statistics > Consumer Price Index > Methods > Additional Resources

Use of medical claims data in the CPI for physicians’ and hospital services

Starting with the release of October 2024 indexes in November 2024, the BLS will begin using secondary source medical claims data to measure price change for the private insurance portion of physicians’ services and outpatient hospital services categories of the Consumer Price Index (CPI).^[1] Medical claims data provide significantly more observations and capture more types of services than data collected using traditional methods. These data also address existing challenges with collecting data from hospitals and doctor’s offices, which is especially relevant for private insurance prices.

For physicians’ services, the CPI program is combining medical claims data with traditionally collected data for self-pay (cash) and Medicare part B payers. For hospital services, the CPI program is using medical claims data for the portion of outpatient services that are paid for with private insurance. The rest of the hospitals index - outpatient self-pay (cash), outpatient Medicare part B, and all payers for inpatient services - will continue to be collected using traditional methods.

This document includes details on the claims data and how it will be used in the indexes. The Monthly Labor Review article “Incorporating medical claims data into the Consumer Price Index” describes the research behind this methodology change.

Traditional methodology

Traditionally, all prices used in the CPI for physicians' services and hospital services have been collected using data from the Commodities and Services (C&S) survey. Businesses (called “outlets” in the CPI program) that provide physicians’ services are sampled based on results of the Consumer Expenditure Surveys (CE). Hospital services outlets are sampled from a dataset produced by the American Hospital Association.

CPI data collectors work with survey respondents from the sampled outlets to scientifically select specific services to price on a monthly or bimonthly basis. Respondents are asked to provide a price for each sampled service. The characteristics of the original service, such as the exact care provided, the payer type (self-pay (cash), Medicare, or private insurance), and any applicable diagnostic or service codes are held constant over time. The price of the sampled item includes all reimbursements to the outlet including payments by both the patient and the insurance provider (if applicable).

For more information on how C&S survey data are collected and calculated, see the BLS Handbook of Methods and the medical care factsheet for CPI .

Medical claims data

BLS purchases medical claims data from a national health insurance aggregator. The data, which are collected from bills submitted by healthcare providers to a patient’s insurance provider, are a convenience sample of medical procedures that are performed by healthcare providers and reimbursed by insurers.

Before BLS receives the medical claims data, the data provider removes price outliers at the individual claim level. The data provider also filters the data to remove ineligible provider types and procedure codes.

The claims data are received by BLS with a three-month lag. For example, data for a doctor’s office visit that occurred in January (the service month) would not be delivered to BLS until April (the index month). The CPI program uses these January data to calculate April indexes, which are then published in early May. Because it takes a few months for the insurer to adjudicate claims, the data are only available with a lag. The CPI program’s research indicates even with a lag, the medical claims data performs better than the traditionally collected data.^[2]

Sampling Methodology

The data provider uses a sampling methodology that was developed by the CPI program to select observations from their data frame. Observations are sampled from the secondary source medical claims data for the 75 CPI primary sampling units (PSUs).^[3] The CPI program uses probability proportional to size sampling without replacement to select medical providers (outlets) and individual services at those outlets. Total revenue during the previous 12 months is used to weight the probability calculations.

For each PSU, BLS independently selects a sample of 200 physician National Provider Identifiers (NPIs) and 10 hospital NPIs.^[4] Medical services are defined using Current Procedural Terminology (CPT) codes, and each service is broken out by both the insurance company that pays for the service and whether that insurance is a Medicare Advantage (MA) plan or not.^[5],[6] Ten medical services are sampled for each physician and 100 services are sampled at each hospital. This procedure results in the collection of up to 2,000 unique observations for physicians’ services and 1,000 unique observations for outpatient hospital services per PSU per month.

Each individual observation in the claims data is comprised of the unique combination of NPI, insurance provider, and service (CPT code). Physicians’ services are additionally identified by place of service (such as a doctor’s office, clinic, etc.). Each observation has an average price, which is calculated across all transactions at the outlet with the same insurance provider and CPT code. Also included in the monthly data is a quantity value that specifies the number of individual claims used to create the average price. The average prices are then used to calculate the unit-value indexes. The unit-value indexes are calculated at the service level and use the number of times each service was provided to weight the index, instead of expenditure.^[7] The average price and quantity values are updated each month over the two years that the observations are included in the sample.

Sample rotation

The medical claims sample is fully rotated every two years, with 25 percent of PSUs rotating every 6 months. The samples rotate in April and October index months of every year (using data for services provided in January and July, respectively). This rotation is more frequent than the four-year sample rotation used for most CPI items. This is beneficial for medical data because it will pick up new or changing procedure codes more quickly.

In each rotation month, the data provider samples new outlets and observations in the rotating PSUs and provides their corresponding average prices from the previous month. This is done so that the CPI program has a price relative for the first month in the areas that are rotating.

Reweighting claims data for representativeness

The CPI program calculates adjustment terms (see example below) to modify the quantity weights at the CPT subcategory level so that the sample expenditure share matches the population expenditure share. The eight CPT subcategories included in the data are anesthesia, evaluation and management, HCPCS level II,^[8] medicine services and procedures, pathology and laboratory procedures, radiology procedures, surgery, and Category II codes.^[9] For services covered with Medicare Advantage, the reweighting is done at the regional level.^[10] For services covered with private insurance, the reweighting is done at the CPI basic index area level.

This reweighting is done separately for Medicare Advantage and non-Medicare Advantage claims to account for the differences in code type distribution by payer type. One limitation of reweighting the sample using this method is that the current time period quantity of all observations in the CPT subcategory must be nonzero (i.e., each code type must have at least one observation per area) in order for the reweighting to work. If an area does not contain any observations for one of the subcategories, the weight from that subcategory is divided and reallocated to the remaining subcategories according to their expenditure. For example, if the anesthesia subcategory has a weight in the population share of five percent but no sampled observations in a given area, the weight is divided into the seven remaining categories proportionally. The table below shows how the adjustment terms are calculated in practice (the example data are fabricated):


Code Category	Population Expenditure	Population Share	Sample Expenditure	Sample Share	Difference Between Population Share and Sample Share
Anesthesia	$120,000,000	5.0%	$1,750,000	2.5%	2.5%
Category II Codes	$30,000,000	1.2%	$1,000,000	1.4%	-0.2%
Evaluation and Management	$1,000,000,000	41.5%	$50,000,000	72.2%	-30.7%
HCPCS Level II	$225,000,000	9.3%	$5,000,000	7.2%	2.1%
Medicine Services and Procedures	$325,000,000	13.5%	$4,000,000	5.8%	7.7%
Pathology and Laboratory Procedures	$100,000,000	4.1%	$750,000	1.1%	3.0%
Radiology Procedures	$275,000,000	11.40%	$2,750,000	4.0%	7.4%
Surgery	$335,000,000	13.9%	$4,000,000	5.8%	8.1%
Total	$2,410,000,000	100.0%	$69,250,000	100.0%	--

In this example, the evaluation and management codes are overrepresented in the sample, as the sample share is 31 percentage points higher than the population share. The overrepresentation of this group causes the other groups to be underrepresented. For this example, the adjustment term for evaluation and management will be normalized to 1.

The sample share of Anesthesia as compared to Evaluation and Management codes is:

The population share of Anesthesia as compared to Evaluation and Management codes is:

Therefore, the adjustment term is given by:

The adjustment term is calculated for the other categories using the method above and is then used to calculate the adjusted sample expenditure and sample shares. The annual sample expenditure is divided by the adjustment term, which changes the weight of the observations within the code category in the sampled data to match the population share provided by the data provider.


Code category	Adjustment term	Adjusted sample expenditure	Adjusted sample share	Population Share	Difference
Anesthesia	0.292	$6,000,000	5.0%	5.0%	0%
Category II Codes	0.667	$1,500,000	1.2%	1.2%	0%
Evaluation and Management	1	$50,000,000	41.5%	41.5%	0%
HCPCS Level II	0.444	$11,250,000	9.3%	9.3%	0%
Medicine Services and Procedures	0.246	$16,250,000	13.5%	13.5%	0%
Pathology and Laboratory Procedures	0.15	$5,000,000	4.1%	4.1%	0%
Radiology Procedures	0.2	$13,750,000	11.4%	11.4%	0%
Surgery	0.239	$16,750,000	13.9%	13.9%	0%
Total		$120,500,000	100.0%	100.0%	--

This process only affects the weighting of code categories within an area. It does not affect the weighting of the code categories across areas because the CPI uses Consumer Expenditure data as weights for aggregation of areas up to the national level.

Index methodology

Price change for claims data is constructed using an annually chained Lowe index that uses the average monthly quantity from the prior year.

where q_i^yrave is the average monthly quantity based on the previous year of data for observation i. If there are no claims for a sampled item in a given month, a price relative cannot be calculated. In this case, the missing current-period price is imputed by using the aggregated price relative for the corresponding PSU for physicians’ or outpatient hospital services in the given month. Imputing relatives is consistent with standard CPI cell mean methodology.

Any observations with a price relative above or below three standard deviations from the mean are removed from price index calculation.

Weighting Payer Types

The CPI program uses data from the Medical Expenditure Panel Survey (MEPS)^[11] to weight the three payer types (self-pay (cash), Medicare, and private insurance) included in the physicians’ and hospital services indexes to create indexes representing all payers. This is done in order to ensure that the physicians’ and hospital services indexes reflect the correct shares of payer types. The MEPS data is available with a 3-year lag. Furthermore, the MEPS data are used to provide insured payers weights by population for CPI-U, CPI-W, and CPI-E.

The CPI uses a multi-stage process to achieve the appropriate payer type representativity. First, outpatient hospital expenditures are isolated from overall hospital expenditures in the MEPS data. Next, the share of private insurance and Medicare Advantage expenditures is calculated using MEPS data for both physicians’ services and outpatient hospital services within each region. Then, all CPI areas within the region are assigned the corresponding regional MEPS share (for example, the Boston-Cambridge-Newton, MA-NH area is assigned the share for the Northeast region).

Aggregating Payer Types

For physicians’ services, CPI uses the MEPS shares to calculate an arithmetic average area price relatives that include private insurance claims (as described in the Index Methodology section above), self-pay (cash), and Medicare part B payers (as described in the Traditional Methodology section above).

For hospital services, CPI uses the MEPS shares to calculate an arithmetic average area price relatives that include outpatient private insurance claims and the rest of the index (outpatient self-pay (cash), outpatient Medicare part B, and all payers for inpatient services). The resulting relatives are arithmetic averages representing all payers for both indexes.

Finally, CPI aggregates the area relatives using area weights provided by the CE survey to create U.S.-level indexes representing services paid for by all payers.

Last Modified Date: October 18, 2024

^[1] Both inpatient and outpatient services are included in the hospital services index (SEMD01).

^[2] See the MLR article linked at the beginning of this paper for more information on this research.

^[3] For more information on PSUs, see the Handbook of Methods (link on page 1).

^[4] If fewer than 200 distinct physician NPIs or 10 hospital NPIs are available in the PSU, all available NPIs are sampled.

^[5] While Medicare Advantage is part of Medicare, the CPI program treats Medicare Advantage plans distinct from traditional Medicare because they are administered by private companies and cost different prices than traditional Medicare. Medicare Advantage is also known as Medicare part C.

^[6] The data aggregator masks the identity of the insurance company, but the selected insurers are held constant over time.

^[7] For a discussion of unit-value indexes and claims data, see Accounting for health and health care: approaches to measuring the sources and costs of their improvement (National Research Council, 2010).

^[8] Healthcare Common Procedure Coding System (HCPCS) Level II codes are used to describe non-physicians’ services and medical goods, such medical transport, certain drugs, or surgical supplies.

^[9] Category II Current Procedural Technology (CPT) codes are used to report certain services, such as diagnostic screenings or preventative interventions, that are associated with positive medical outcomes and quality care.

^[10] The four geographic regions are Northeast, South, Midwest, and West. Please see the Presentation section of the CPI Handbook of Methods (linked here).

^[11] MEPS is produced by the Agency for Healthcare Research and Quality, which is a part of the Department of Health and Human Services. The MEPS data include information on specific types of medical services provided in the U.S., how frequently they are provided, their cost, and how they are paid. Information on MEPS can be found at https://meps.ahrq.gov/mepsweb/about_meps/survey_back.jsp