National Longitudinal Surveys
The NLS surveys are based upon stratified multistage random samples, with oversamples of blacks in all cohorts, oversamples of Hispanics in the NLSY79 and NLSY97, and additional oversamples of disadvantaged nonblack non-Hispanics and youths in the military in the NLSY79. Data from each interview year include a weight specific to that year. When this weight is applied, the number of sample cases is translated into the number of persons in the population that those observations represent.
The assignment of individual respondent weights involves at least three stages. The first stage involves the reciprocal of the probability of selection at the baseline interview. Specifically, this probability of selection is a function of the probability of selection associated with the household in which the respondent was located, as well as the subsampling (if any) applied to individuals identified in screening. The second stage of weighting adjusts for differential response (cooperation) rates in the screening phase. Differential cooperation rates are computed (and adjusted) on the basis of geographic location and group membership, as well as by group subclassification. The third stage of weighting attempts to adjust for certain types of random variation associated with sampling, as well as sample "undercoverage." The estimated ratios are used to conform the sample to independently derived population totals.
Subsequent to the initial interview of each cohort, reductions in sample size have occurred due to noninterviews (the failure, for one reason or another, of the person to be interviewed). In order to compensate for these losses, the sampling weights of the individuals who were interviewed had to be revised. A revised weight for each respondent was calculated for each interview year, using the method just described.
In the event that one wishes to tabulate characteristics of the sample for a single interview year in order to describe the population being represented, it is necessary to weight the observations by using the weights provided. For example, to compute the average hours worked in 1987 by individuals in the NLSY79 (persons born in 19571964 and living in the United States in 1978), one simply weights the average hours worked by the 1987 sample weight. The weights are correct when used in this way.
Often, users confine their analyses to subsamples for which respondents provide valid answers to certain questions. Weighted means here will represent, not the entire population, but rather those persons in the population who would have given a valid response to the specified questions. Nonresponse to any item because of refusals or invalid skips is generally quite small, so the degree to which the weights are incorrect also is probably quite small. In these instances, although the population estimates may be moderately in error, the population distributions (including means, medians, and proportions) are reasonably accurate. Exceptions to this assumption might occur for data items that have relatively high nonresponse rates, such as family income.