BLS Presentations at the 2024 Joint Statistical Meetings

Visit the conference website for more information.

Monday, August 5, 2024

Time Series Analysis of Consumer Price Index Weights

BLS author(s): MoonJung Cho

Weight is one of the main components in the Consumer Price Index (CPI) formula. The CPI program revises fixed quantity weights on an annual basis presently. Using graphical displays and correlation analysis, we explore the relationship of CPI series whose weights are from different reference periods, and the relationship among the weights. Our focus is to analyze the seasonal components of CPI series. The initial investigation was through frequency analyses using Discrete Fourier Transform. We further investigate various aspects of CPI series including trend, changes in rates, jump discontinuity, and outliers.

Session: SPEED 2: Data Challenge II & Methods for Correlated Data, Part 1

Time: 2:00 PM–3:50 PM

Location: Room CC-E141

An Analysis of Poverty Through a Consumption Lens: Research from the U.S. Bureau of Labor Statistics

BLS author(s): Thesia I. Garner, Brett Matsumoto, and Jake Schild

In this study we conduct an in-depth analysis of consumption poverty with comparison to pre-tax income poverty. Both relative and absolute poverty are considered. We produce poverty statistics by demographic groups and examine how poverty has changed from 2019 to 2022, a period prior to the COVID pandemic, during the pandemic, and after. This builds on the work over the past couple of years by researchers at the U.S. Bureau of Labor Statistics who have been involved in the development a series of consumption measures. The primary data set for these measures is the Consumer Expenditure Surveys (CE). An article in the April issue of the Monthly Labor Review describes the development of the measure; simple poverty and inequality statistics for 2019-2021 are included as examples of how the measure could be used.

Session: Reimagining Poverty Measurement: Insights from Cutting-Edge Research

Time: 8:30 AM–10:20 AM

Location: Room CC-C122

Visualization Tools for Time Series Model Selection

BLS author(s): Kate Eckerle

Price indices that place quantity information on the same time scale as prices enable them to capture real-time phenomena such as substitutions, and evolving consumer preferences more generally. But, because BLS quantity information (obtained from households) comes with an approximately one-year lag, it is not possible to produce such an index until this year passes. To provide a timely statistic, BLS issues a preliminary version of the index, currently using a constant elasticity of substitution model. This approach applies a fixed level of substitution across all items, areas, and months. We propose instead to forecast the weights needed by the index via a seasonal multivariate time series model of item-area expenditure shares and incorporate the item-area prices, which are presumed to drive much of substitution behavior. The choice of time series model and its parameters has a profound effect on the behavior of the index. Visualization plays an important role in this selection. To choose the model that best captures the dynamic consumer behavior underlying the changes in the index, graphs displaying how parameter choices affect the fit of the index are necessary.

Session: Statistical Visualization Concepts, Tools and Applications for Evidence-Based Policymaking and Improved Public Communication

Time: 10:30 AM–12:20 PM

Location: CC-258

Using Linked Micromaps for Evidence-Based Policy

BLS author(s): Randall Powers

Linked micromaps were developed to display geographically indexed statistics in an intuitive way by linking them to a sequence of small maps. The approach integrates several visualization design principles, such as small multiples, discrete color indexing, and ordering. Linked micromaps allow for other types of data displays that are connected to geography, including scatterplots, boxplots, time series plots, confidence intervals, and more. Initial applications of micromaps used data from the National Cancer Institute and the Environmental Protection Agency. In this presentation, we will show how linked micromaps can be used to better understand and explore relationships and distributions of statistics linked to US states and DC. We will compare linked micromaps with other popular data displays, such as bubble charts, choropleth maps, and bar charts. We will illustrate how linked micromaps can be used for evidence-based decision-making using data from the Bureau of Labor Statistics (e.g., Quarterly Census of Employment and Wages, Occupational Employment and Wage Statistics) and the Census Bureau (e.g., Building Permits Survey, Community Resilience Estimates).

Session: Statistical Visualization Concepts, Tools and Applications for Evidence-Based Policymaking and Improved Public Communication

Time: 10:30 AM–12:20 PM

Location: CC-258

Preliminary Blended Index Variance Estimation with Census Trade Data

BLS author(s): Daniel Yang and Daniell Toth

The International Price Program (IPP) at the Bureau of Labor Statistics (BLS) produces Import and Export Price Indexes (MXPI) which consists of two components: import and export merchandise. The MXPI is in the transition to replace a portion of the directly collected establishment survey data with Census Trade Data (CTD). CTD is administrative trade transaction data that includes specific shipment records, such as international Harmonized System (HS) product classification codes for the United States, trade values in dollars, foreign country of import or export, shipment quantity, etc. CTD has the advantage of covering almost all U.S. trade within a month. However, the price that is collected by CTD is an average price of shipment which is different from the directly collected survey data from business establishments. MXPI conducts a Bootstrap resampling process on the directly collected survey data to assess the sampling error, meanwhile, CTD presents error types that have not been considered before. In this preliminary study, we explored the variance estimation of index blended from two data sources: an establishment survey and a census.

Session: Navigating Complexity: Recent Advances in Analysis of Data from Complex Surveys

Time: 10:30 AM–12:20 PM

Location: Room CC-G132

Understanding Mentions of BLS Products Through Topic Modeling of News Articles

BLS author(s): Erin Boon

The Bureau of Labor Statistics (BLS) measures labor market activity, working conditions, price changes, and productivity in the U.S. economy to support public and private decision making. To meet this mission, BLS not only publishes statistics and research on its own website but also seeks to understand when and where its products are mentioned in online news sources. Making sense of this huge volume of news articles is impossible without a means of summarizing and grouping them. Using article data collected by a third-party service, we experimented with several methods to model the topics contained in news articles that mention BLS products. We compared and optimized candidate models with a goal of meeting the needs of internal stakeholders who use the output to help evaluate the impact of their outreach efforts. Ultimately, we selected a model that provided the best balance of evaluation metrics and utility to these users. This presentation will include a summary of the models we explored and the process we developed to compare them.

Session: Applications of Text Analysis

Time: 10:30 AM–12:20 PM

Location: CC-D138

Exploratory Analysis that Redefined the Parameter of a Variable in the Consumer Price Index Housing Age

BLS author(s): Alice Yu, Ayme Tomson, Ben Houck, and Chung Wing Tse

Rent and owner's equivalent rent in Consumer Price Index (CPI) uses building's age bias and structural change factors. These factors are calculated each December, then run in January, and put into production use for the following 12 months. A multivariable regression model with 35 independent variables and one dependent (unit's rent) variable is used to calculate these factors. One of the independent variables is the product of the building's age (the current year minus the year a structure is built) and a binary variable, "old". The "old" binary variable is defined as if the unit is built in 1919 or earlier, or if the unit is built in 1920 or later. This "old" parameter is defined in a static manner that underperforms in statistical significance and restricts CPI methodology. An alternative, less restrictive, definition of the "old" parameter which performs with higher statistical significance is found in this research.

Session: Recent Advances in Estimation Methods for Survey Data

Time: 2:00 PM–3:50 PM

Location: Room CC-C121

Evaluation of a Modified Gross Flows Estimator for The Current Population Survey

BLS author(s): Stephen Miller and Connor Doherty

We present a gross flows estimation approach which builds off the paper of Stasny and Fienberg (1985). Our method uses population weighted estimates from two consecutive months of matched data from the Current Population Survey (CPS) using the sampling weights from each of the two matched months to produce two sets of partial gross flows tables. We then use a modeling approach from Stasny and Fienberg to reconcile the two partial tables to produce an estimate of the population gross flows table. Closed form solutions are presented which require an optimization solution to determine Lagrange parameters. We use the method to produce estimated gross flows tables for CPS from 2003-2023 and estimate the variance of the estimates by replication.

Session: Recent Advances in Estimation Methods for Survey Data

Time: 2:00 PM–3:50 PM

Location: Room CC-C121

Tuesday, August 6, 2024

Estimating Preferences Over Data to Inform Statistical Disclosure Control Decisions

BLS author(s): Elan Segarra

Most implementations of statistical disclosure control (SDC) maximize the number of published statistics rather than assess their relative value to downstream consumers. This project provides a novel framework for estimating consumer demand for statistics and incorporating these estimates into SDC methods. We model consumer demand for statistics using a nested logit discrete choice model where statistics are bundled into published data products which vary by multiple characteristics, such as the conditioning or presentation type. As proof of concept, we estimate model parameters using pageview data on requested statistics and tables from the Census of Fatal Occupational Injuries, and we find significant heterogeneity in valuation across characteristics. Estimates are then used to calculate valuations of new potential statistics which can be compared when making SDC decisions. For example, in a cell suppression problem (CSP) we can ensure the most valuable cells are unsuppressed by using the estimated valuation of each cell. In the context of formal privacy these valuations suggest heterogeneous allocation of a privacy budget across several potentially publishable data products.

Session: Data Privacy and Survey Data Discoveries

Time: 8:30 AM–10:20 AM

Location: CC-B110

Examining Household Income and Response Rates in the Consumer Expenditure Survey

BLS author(s): Lauren Vermeer

Income is an important analysis variable in household surveys. In 2013, the Consumer Expenditure Survey (CE) found a variable on a publicly available dataset from the Internal Revenue Service (IRS), which gave the average adjusted gross income (AGI) in most US zip codes. This zip-code level AGI variable from the IRS was determined to be correlated with CE's response rates. The variable was introduced into CE's nonresponse adjustment procedure, the first to be added from outside the survey's sampling frame. It is important to periodically check and confirm variables' functionality in survey procedures, so the variable was reexamined in 2023. This paper describes both the reexamination process and the results, which show that the relationship between income and households' response rates is less stable than previously determined.

Session: Recent Applications of Sampling and Survey Methods

Time: 10:30 AM–12:20 PM

Location: Room CC-D139

Standard Errors of Nonstandard Estimates in the Current Population Survey

BLS author(s): Morgan Heyde and Justin McIllece

Since January 2023, the U.S. Bureau of Labor Statistics (BLS) has produced monthly standard errors for primary Current Population Survey (CPS) labor force estimates using the newly developed GVF (generalized variance function) Production System. In its present form, the GVF Production System computes modeled standard errors for estimates of the following types: levels, or counts; rates, such as the official U.S. unemployment rate; mean and median weeks unemployed; and hourly and weekly earnings percentiles. However, many other CPS labor force series of significant economic interest, such as female-to-male earnings ratios and average hours at work for various demographics, are uncovered by GVF models. In this paper, research models are developed for these "nonstandard" CPS estimates, selected statistical inferences are drawn to evaluate the marginal utility of modeling the variances relative to direct replication, and the potential for implementation into the GVF Production System is discussed.

Session: Recent Applications of Sampling and Survey Methods

Time: 10:30 AM–12:20 PM

Location: Room CC-D139

Comparison of Variance Estimators for Self-Representing Primary Sampling Units

BLS author(s): Stephen Ash

Many surveys estimate variances with the balance repeated replication (BRR) variance estimator. With the self-representing (SR) Primary Sample Units (PSUs), surveys sometimes split them into parts which are then paired into pseudo strata and then BRR is applied to the pseudo strata. However, there is not much guidance on the number of pseudo strata to split the SR strata into or how (or if) the sort order should be used to split the sample when the sample was selected with systematic random sampling. Our research considered twelve different applications of the BRR variance estimators that varied by the number of pseudo strata formed and by how the sort order of a systematic random sample was used to split the PSU. We also included variations of the delete-a-group jackknife and successive difference replication variance estimators. Using simulations involving data from the Consumer Expenditures Survey, we found that the BRR variance estimator that split the sample of the SR PSUs into the most replicates possible and split the sample using the sort order was the best overall variance estimator for both national-level estimates and individual PSU-level estimates.

Session: Challenges in Error Estimation for Survey Data

Time: 10:30 AM–12:20 PM

Location: Room CC-G132

Comparison of recent methods for combining probability and non-probability samples

BLS author(s): Julie Gershunskaya

Recent proliferation of computers and the internet has opened new opportunities for collecting and processing data. However, such data are often obtained without a well-planned probability survey design. Such non-probability based samples cannot be automatically regarded as representative of the population of interest. Several methods for estimation and inferences from non-probability samples have been developed in recent years. The methods assume that non-probability sample selection is governed by an underlying latent random mechanism. The basic idea is to use information collected from a probability ("reference") sample to uncover latent non-probability survey participation probabilities (also known as "propensity scores") and use them in estimation of target finite population parameters. In this paper, we review several recently developed methods for estimation of non-probability survey participation probabilities. We compare theoretical properties of recently published methods to estimate survey participation probabilities and study their relative performances in simulations.

Session: Data Integration in Non-probability Sampling and Small Area Estimation for Official Statistics

Time: 2:00 PM–3:50 PM

Location: Room CC-F151

Wednesday, August 7, 2024

Challenges of Estimating Inflation in Small Areas in Official Statistics

BLS author(s): Vladislav Beresovsky and Terrance Savitsky

The Consumer Price Index (CPI) survey is designed to measure inflation by collecting quotes in sampled Core-Based Statistical Areas (CBSA) of the U.S. The current design provides for reliable estimation of relative price changes with uncertainty measures in a limited number of large self-representative (SR) CBSAs and Census Divisions. To produce estimates in other localities (i.e. states), we use area level modeling to mass impute inflation measures in all CBSAs in the U.S. Our project faces multiple challenges, including approximately estimated variances of direct estimates, data sparsity, and sampled CBSAs being poorly representative of the population. We co-model point and variance estimates in small areas to mitigate the effect of unreliably estimated variances, and employ intelligently constructed highly informative priors, data clustering, and spatial modeling to compensate for sparsity and lack of representativeness of the available sample. We investigate the dependence of imputed inflation characteristics in small domains on different model assumptions.

Session: Challenging Aspects of Small Area and Survey Research

Time: 10:30 AM–12:20 PM

Location: Room CC-256

Vector-weighted Mechanisms for Utility Maximization under Differential Privacy

BLS author(s): Terrance Savitsky

We address practical implementation of a risk-weighted pseudo posterior synthesizer for microdata dissemination with a new re-weighting strategy that maximizes utility of released synthetic data under at any level of formal privacy guarantee. Our re-weighting strategy applies to any vector- weighted pseudo posterior mechanism under which a vector of observation-indexed weights are used to downweight likelihood contributions for high disclosure risk records. We demonstrate our method on two different vector-weighted schemes that target high-risk records. Our new method for constructing record-indexed downeighting maximizes the data utility under any privacy budget for the vector-weighted synthesizers by adjusting the by-record weights, such that their individual Lipschitz bounds approach the bound for the entire database. Our method achieves an asymptotic differential privacy (aDP) guarantee, globally, over the space of databases. We illustrate our methods using simulated highly skewed count data and compare the results to a scalar-weighted synthesizer under the Exponential Mechanism (EM). We also apply our methods to a sample of the Survey of Doctorate Recipients.

Session: Challenging Aspects of Small Area and Survey Research

Time: 10:30 AM–12:20 PM

Location: Room CC-256

From Typewriters to Telecommuting: Four Decades of Remote Work Insights from the NLSY79

BLS author(s): Avery Baumgart

As remote work continues to gain prominence in contemporary labor markets, a growing set of measures has arisen to track its use, its determinants and its effects. This study provides context for future work by examining a measure available in the National Longitudinal Survey's 1979 cohort (NLSY79) that has tracked work from home over the last 40 years. Using this measure, the study documents remote work trends prevalent within the cohort over their lives and examines the dynamic relationship between sociodemographic characteristics and remote work. From 1988 to 2020, total work from home time increased and time worked from home comprised a larger percentage of total reported work time for employed workers. The bulk of this increase was seen between 2018 and 2020, with the percentage of total hours worked from home rising from 8.6 percent to 21.1 percent within the cohort. These changes are tied to demographic shifts among employed workers who work from home full time.

Session: Contributed Poster Presentations: Business and Economic Statistics Section

Time: 10:30 AM–12:20 PM

Location: Room CC-Hall CD

Thursday, August 8, 2024

Research US Producer Price Indexes Using a Geometric Mean Formula

BLS author(s): Robert Martin, Andy Sadler, Sara Stanley, William Thompson, and Jonathan Weinhagen

The Bureau of Labor Statistics (BLS) recently introduced a research series using a geometric Young formula for the calculation of elementary level Producer Price Indexes (PPI). The geometric Young formula has better axiomatic properties than the current modified Laspeyres formula. In most cases, indexes calculated using the geometric Young escalate between 0.1 and 0.3 percentage points less each year than indexes calculated using the modified Laspeyres. However, for wholesale and retail trade as well as for some other services, the differences are much larger. As a result, using the geometric Young at the elementary level lowers the US PPI for Final demand by 0.56 percentage points per year over the 11-year period from 2012 through 2022, a magnitude larger than what has been previously found for the US Consumer Price Index.

Session: Time Series Analysis and Miscellaneous Topics

Time: 8:30 AM–10:20 AM

Location: Room CC-F152

Last Modified Date: August 2, 2024