Predicting Industry Output with Statistical Learning Methods

Peter B. Meyer and Wendy L. Martinez

Abstract

The U.S. Bureau of Labor Statistics uses estimates of industry output to construct preliminary annual productivity statistics for U.S. manufacturing industries. The official measures of output become available much later. We examine how well several alternative models predict output for each of the 21 industries which make up U.S. manufacturing, using data sources available within four months of the end of the reference year. To measure prediction quality, we use a form of year-wise cross-validation in which industry output from any of the years 2007-2014, even the first, can be predicted by the others and calculate the implied prediction error. This error metric enables us to test the prediction method on each industry over eight years. Our predictors are highly correlated, which reduces the accuracy of prediction. Several methods to address the collinearity problem generate more accurate out-of-sample estimates than ordinary least squares regression. We find that selecting only the best three regressors reduces prediction error by 15%, and a principal components regression by about 20%, compared to OLS.