Department of Labor Logo United States Department of Labor
Dot gov

The .gov means it's official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you're on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Bureau of Labor Statistics > Office of Survey Methods and Research > Publications > Browse Research Papers

Creating a Taxonomy of Statistical Methods Using Text Analysis

Wendy L. Martinez and Terrance D. Savitsky

Abstract

The United Nations Economic Commission for Europe (UNECE) holds an annual workshop on Statistical Data Editing with a focus on official surveys. The 2017 workshop organizers formed subgroups who were tasked to come up with ideas to foster the implementation of good practices and international collaboration among the statistical offices of member countries. One proposal from the subgroups was to conduct a classification of existing methods for data editing and imputation based on papers presented in previous UNECE work sessions on data editing. Another idea was to create an indexed and searchable inventory of these papers using a taxonomy. This paper describes research addressing the first idea to construct a taxonomy of topics addressed by the UNECE data editing group. To do this, we downloaded all papers from the annual work sessions, converted them to machine readable format, and applied text analysis approaches to create a taxonomy based on the papers. This paper will describe the process and tools used to create the taxonomy, so others can apply these same ideas to their document collections.