Creating a Taxonomy of Statistical Methods Using Text Analysis

Wendy L. Martinez and Terrance D. Savitsky

Abstract

The United Nations Economic Commission for Europe (UNECE) holds an annual workshop on Statistical Data Editing with a focus on official surveys. The 2017 workshop organizers formed subgroups who were tasked to come up with ideas to foster the implementation of good practices and international collaboration among the statistical offices of member countries. One proposal from the subgroups was to conduct a classification of existing methods for data editing and imputation based on papers presented in previous UNECE work sessions on data editing. Another idea was to create an indexed and searchable inventory of these papers using a taxonomy. This paper describes research addressing the first idea to construct a taxonomy of topics addressed by the UNECE data editing group. To do this, we downloaded all papers from the annual work sessions, converted them to machine readable format, and applied text analysis approaches to create a taxonomy based on the papers. This paper will describe the process and tools used to create the taxonomy, so others can apply these same ideas to their document collections.