Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

A Comparison of Record Linkage Techniques

Lowell G. Mason

Abstract

It has become increasingly common to create new statistical products by integrating existing data rather than engaging in new data collection; using existing data sources is less expensive and does not increase respondent burden. However, it is usually not possible to satisfactorily integrate the multiple data sources without manual intervention. An example is the integration of the Bureau of Economic Analysis (BEA) enterprise-level data on Foreign Direct Investment (FDI) with establishment data from the Bureau of Labor Statistic's Quarterly Census of Wages and Employment (QCEW). In this particular case, the initial error rate was 87.7%. After manual review and correction, the error rate was reduced to 19.0%. The labor cost, however, was considerable: almost 1,510.5 hours. To reduce linkage error and labor costs, we implement several record linkage techniques. We consider supervised learning techniques, such as Support Vector Machines (SVM) and Random Forests. Finally, as a baseline comparison, we implement the methods developed by Fellegi and Sunter (1969).