In the American Time Use Survey (ATUS), interviewers use a set of open-ended questions to walk respondents chronologically through their activities during the prior 24-hours. In contrast, other surveys ask people about “the average, normal, or typical“ time spent on activities (stylized questions). Estimates of sleep duration in the ATUS and other diary measures exceed those of stylized questions by approximately 1.7 hours – termed the sleep gap. Our research draws on a variety of evaluation methods (behavior coding, cognitive interviews, quantitative research, and a validation study using sensor data) to examine reasons for the discrepancy between diary and stylized sleep measures and to uncover potential sources of measurement error that may contribute to the sleep gap. We discuss the strengths and weaknesses of each method and how they can build off one another in the questionnaire evaluation process to gain a deeper understanding of a substantive survey methods issue.