The redesign of the CPI geographic sample
The selection of new geographic sampling areas ensures that
the 1998 revised Consumer Price Index is representative of
current demographics
The most basic
element of the Consumer Price Index (CPI) decennial revision
program is the selection of new CPI samples. The selection of
geographic areas is the first stage of the CPI’s multistage
sample design. In subsequent stages, BLS analysts select the
outlets (places where area residents make purchases), goods and
services (items purchased), and residents’ housing units.
Historically, the
Bureau of Labor Statistics has used the Office of Management and
Budget’s (OMB) definition of Metropolitan Areas first to
determine the geographic boundary between the metropolitan and
nonmetropolitan areas of the United States for the CPI,^{1} and second to divide the metropolitan
United States into geographic sections called primary sampling
units (hereafter, called sampling units). However, there are five
sampling units within the metropolitan area that are not
OMBdesignated Metropolitan Areas.^{2}
In the nonmetropolitan area (a total of 77 percent of U. S.
land), BLS forms nonmetropolitan sampling units. In general, a
sampling unit is delineated by county borders (with some
exceptions in New England), and can comprise several counties.
Currently, BLS
publishes the Consumer Price Index for All Urban Consumers
(CPIU) which covers residents of the metropolitan area, as well
as residents of urban parts of the nonmetropolitan area.^{3} Based on the 1990 census, 87 percent
of the U.S. population is included in the CPIU definition. In
1989, when planning began for the 1998 revision of the CPI, one
major change envisioned was to publish a Total U.S. Population
Consumer Price Index, the CPIT. To accommodate this expanded
CPIT, a larger number of sampling units needed to be selected
throughout the country to represent the previously unrepresented
population.
However, an increase
in the number of selected sampling units entails an increase in
the total cost of the CPI. When the sampling unit selection
process was scheduled to begin in 1993, no decision to publish
the CPIT had been made. To meet the deadline for sampling unit
selection, BLS decided to use a dual strategy when forming
nonmetropolitan sampling units and determining how many sampling
units to select from each of the four census regions.
This article
describes the area selection process for the 1998 CPI revision.
The basic steps in the geographic area selection process are:
 Determine sample classification variables
 Construct sampling units
 Classify sampling units by population size and allocate
sample
 Determine stratification variables within each
region’s size class and stratify
nonselfrepresenting sampling units
 Select sampling units for the CPI geographic sample
These steps are
basically the same as those followed for the 1987 revision. This
article highlights how the 1998 revision methodology and the
final sample design differ from the previous revision.^{4}^{}
Determine sample classification variables
In both the 1987 and 1998 sample designs, sampling units were
classified first by location, based on one of the four census
regions: Northeast, Midwest, South, and West. In the
1987 design, population size, the second classification variable,
had four classes; whereas in the 1998 design it has three.
For the metropolitan
area, the population size class variable is used to designate
selfrepresenting sampling units (areas which have a large enough
population to be selected for the sample with certainty) and
nonselfrepresenting sampling units (areas which are randomly
selected to represent themselves as well as other metropolitan
areas not selected for the sample). Both the 1987 and the 1998
designs have one size class for selfrepresenting metropolitan
sampling units (A size class). The 1987 design used two size
classes for nonselfrepresenting metropolitan sampling units by
drawing a distinction between medium (B size class) and small (C
size class) nonselfrepresenting metropolitan sampling units, and
the population boundary depended on the census region of the
sampling unit.^{5} These two
populationsize classes were combined in the 1998 design. The
decision to have just one population class (designated as B/C) of
nonselfrepresenting metropolitan sampling units eliminated the
difficulty of defining the population boundary between small and
medium metropolitan sampling units, as encountered in the 1987
revision. (See exhibit 1.)
In the 1987 sample
design, an additional class variable—urban or rural
nonmetropolitan—was required because the geographic areas
selected for the CPIU were also used in the Consumer Expenditure
Survey. The definition of "population" in the Consumer
Expenditure Survey includes the total nonmetropolitan
population—urban and rural—compared with the CPIU
population definition, which includes only the urban parts of the
nonmetropolitan area. In the 1987 design, in order to support the
expenditure survey’s total population definition and the
more restrictive CPIU definition, the sample design in the
expenditure survey required two nonmetropolitan
classes—urban and rural nonmetropolitan. The nonmetropolitan
area for the 1987 design was first divided into urban and rural
areas. Then the urban area was divided into urban sampling units,
which were sampled simultaneously for the CPI and the expenditure
survey. Subsequently, the rural area was divided into rural
sampling units from which the rest of the sampling units for the
expenditure survey were chosen. The map in exhibit
2 illustrates the size of the nonmetropolitan land
area.^{6} In the 1998 design, this
dichotomy was not required, because nonmetropolitan sampling
units were sampled from the total nonmetropolitan area, based on
the CPIT population definition. If a decision was made not to
publish a CPIT after the selection of the nonmetropolitan CPIT
sampling units, urban parts of a subsample of these units would
become the nonmetropolitan CPIU sampling units. However, the
selection of CPIU sampling units and the proportion of the CPIU
population they represent is based on the CPIT sampling unit
selection.
Construct sampling units
For the 1998 revision, the nonmetropolitan sampling units were
formed from counties (or from minor civil divisions in Hawaii and
in all six New England States). To create a potential sampling
unit containing some urban consumer units,^{7}
5,000 urban consumer units were necessary per sampling unit,
while 5,000 rural consumer units were needed if the potential
sampling unit contained no urban consumer units. This sampling
unit population size is required in order to have enough consumer
units to support the various household surveys using this
design—the Consumer Expenditure Survey, the Continuing
PointofPurchase Survey, and the CPI Housing Survey—without
unduly burdening respondents. All counties in the sampling units
had to be contiguous, and a reasonable attempt was made to stay
within State boundaries. In some areas, it was impossible to find
contiguous counties with either more than 5,000 urban consumer
units or more than 5,000 rural consumer units with no urban
consumer units. In these cases, BLS eventually formed some
sampling units containing some urban consumer units (but not
5,000 of them) and with at least 5,000 total consumer units. For
example, the combination of Lake and Cook counties in
northeastern Minnesota contains 6,353 consumer units, but only
1,665 urban units. If the CPIT was abandoned, and the urban part
of one of these sampling units was selected for the CPIU, BLS
planned to add urban parts of neighboring sampling units in the
same stratum to be used only for the CPI Housing Survey sample.^{8} (Details on stratifying sampling units
into classes are discussed later in this article.)
ATLASGIS (geographic
information system) mapping software, which drew computer maps
overlaid with the relevant census population data, was employed
in this sampling unit formation. This software also was used to
derive the sampling unit location variables—longitude and
latitude—employed in sampling unit stratification.
Classify units by population; allocate sample
Census region and population size are used to partition all of
the sampling units into a total of 12 classes—the four
census regions and three populationsize classes within each
region. The CPI’s sample allocation consists of determining
how many sampling units will be sampled from each of these 12
size classes. The combination of sampling unit classification and
sample allocation is an iterative process that is constrained by
budget as well as index continuity and publication considerations
which are discussed below.
Classifying metropolitan sampling units. After
sampling units are formed, BLS determines the population boundary
between the size of selfrepresenting and nonselfrepresenting
metropolitan sampling units. This process is subject to both
budget constraints and CPI users’ needs. Sampling units
included in the current 1987 design are efficient in terms of
program costs and users’ needs. Continuing sampling units
are less expensive to resample because trained data collection
staff are already available in these areas. CPI users want the
current class A (selfrepresenting) sampling units to remain as
they are because published indexes are available for most of
these areas individually.^{9} To balance
this desire with the mandate to keep data collection costs under
control by limiting the number of new sampling units, BLS
classified all sampling units with populations greater than 1.5
million as class A (selfrepresenting) units for the 1998
revision.^{10} Honolulu and Anchorage
remain class A sampling units because their geographic locations
make price change in these consumer markets unique. The
selfrepresenting sampling units form 4 of the 12 regional size
classes and include 31 sampling units. All Metropolitan Areas not
included in the class A sampling units were classified as class
B/C (nonselfrepresenting metropolitan) and all nonmetropolitan
sampling units were classified as either class Y or class Z.
Exhibit 1 contrasts the 1987 size classifications for sampling
units in the CPI and expenditure survey with those in the 1998
revised CPIU and the 1996 total population Consumer Expenditure
Survey. The budget for the 1998 revised CPI required that the
sample size remain the same as the current one. This meant that
there would be 74 nonselfrepresenting sampling units chosen,
with 18 of them not priced for a CPIU, but only surveyed for
consumer expenditure data.
Dual strategy for sample allocation. BLS
considered many sample allocation strategies to make sure that
the final sample allocation for the Consumer Expenditure Survey
and the proposed CPIT had regional size class samples that were
as proportional to population size as possible, while still being
adaptable to a CPIU. The selected strategy first declared that
the CPIU and expenditure survey would have the same selected
class A and class B/C sampling units. The next step was to
allocate the number of sample nonselfrepresenting metropolitan
and nonmetropolitan sampling units (74) to the remaining eight
regional size classes, proportional to their total populations.
(For example, the number allocated to the West B/C size class
should be approximately equal to the population in the West B/C
size sampling units times 74 divided by the population in
nonselfrepresenting sampling units.) The CPIU and the
expenditure survey each contain 46 nonselfrepresenting
metropolitan sampling units.
To prepare for the
possibility of producing an urbanonly CPI, BLS adopted the
strategy of classifying all nonmetropolitan sampling units into
one size class and of selecting 28 nonmetropolitan units. If,
after the selection, it was decided that the CPI would use the
CPIU population definition rather than the CPIT definition, the
selected nonmetropolitan sampling units would be divided into two
classes, class Y and class Z. The CPIU would use urban parts of
10 of the 28 selected nonmetropolitan sampling units to represent
the urban nonmetropolitan population; these urban parts would be
designated as D sample units in the CPIU. The 10 sample units of
which these 10 are parts are called Y sample units. The
expenditure survey would use these Y sample units and the
remaining 18 nonmetropolitan sample units (called Z sample units)
to represent the total nonmetropolitan population.
The method used to
classify the selected sampling units as class Y or Z was
iterative. First, the chosen nonmetropolitan sampling units with
no urban population would become Z sample units. Then, from the
remaining selected nonmetropolitan sampling units, a total of 10
would be chosen to be classified as Y sampling units with
probability proportional to the urban population of their strata.
This selection was performed in each region, based on the number
of nonmetropolitan sampling units allocated to each region. This
is illustrated in table 1 in the row
labeled D (Y for the expenditure survey). Finally, the remaining
nonmetropolitan sample units would also be classified as Z units.
In addition, the sampling unit’s percent urban population
would be used as a stratifying variable to ensure that the units
in each stratum were as alike as possible on this variable. The
number of sample Z units in each region was determined by the
region’s rural nonmetropolitan population.
With the exception of
food and energy items, the CPI collects prices in most sampling
units^{11} every other month; this is
known as bimonthly pricing. Bimonthly pricing makes it necessary
to pair each selected nonselfrepresenting metropolitan and
nonmetropolitan sampling unit priced in odd months with a
sampling unit in the same regional size class priced in even
months, so that each region’s monthly B/C and D size class
indexes represent approximately the same size populations. Thus,
each region’s B/C and D size class must have an even number
of sampled units. Index publication requires calculation of index
variances. (See "Publication strategy
for the 1998 revised Consumer Price Index" .) Variance
calculation of a particular region’s B/C and D size class
index also requires that sampling units in that size class be
paired with each other (each pair is called a replicate) and that
there are at least two replicates in that nonselfrepresenting
size class.^{12} Thus, index
publication requires that each published nonselfrepresenting
regional sizeclass index area has an even number of sampling
units, amounting to at least four.
Table
1 presents the proportionaltopopulation size sample
allocation to the regional size classes for the 1998 geographic
area design. The 31 class A sampling units in table
1 represent 46 percent of the total population and 53 percent
of the CPIU population. Also of note is the fact that there are
74 nonselfrepresenting sampling units for a CPIT and 56 for a
CPIU.
Comparing the
sampling unit allocation in table 1 with
the publication requirements (mentioned earlier), we see that the
nonmetropolitan CPIU indexes (size class D) for the Northeast
and West will not be published when the 1998 area design is used
to produce the January 1998 index. (Currently, no Northeast or
West nonmetropolitan urban indexes are published.) These regional
size classes do not meet publication requirements, which require
a minimum of four sampling units. However, for a total CPI, a
combined Y and Z class (nonmetropolitan) index could have been
published in every region. Because the Boston sampling unit has
absorbed almost all of the previously nonmetropolitan urban
population in the Northeast, that region did not qualify to have
even one selected D sampling unit.
Stratify sampling units into classes
The next phase of the sampling unit selection for the CPIT
was to stratify (group) the units in each region’s size
class (for example, South B/C) into strata (groups) of similar
sampling units based on their scores on several stratifying
variables. The number of strata is the same as the number of
sampling units to be selected because one sampling unit is chosen
from each stratum. Each class A sampling unit is in a stratum by
itself; thus the name, selfrepresenting. Selection of the
stratifying variables to stratify a region’s B/C and D size
classes was based on linear regression modeling of 1987 through
1992 price change for various time intervals. The independent
variables used in this modeling were subsets of 1990 census and
geographic sampling unit variables. How well CPI price change was
explained by these models was measured by percent R^{2}.^{13} Table 2
exhibits percent R^{2} values for three competing models
of sampling unit price change of various time lags. Data used
were from current class A sample units, excluding Anchorage and
Honolulu. (Anchorage and Honolulu sample units are statistical
outliers because they are geographically removed from the
contiguous United States and also demographically different.)
The geographic model
consists of four independent variables: normalized
(centered and scaled by the range) longitude, the square of
normalized longitude, normalized latitude, and percent urban. The
two other comparison models, which use census variables, are the
7variable model which contains the seven variables of the 1987
revision stratification^{14} along with
percent urban, and an 11variable model. Note that the R^{2}
values for the geographic model are larger than those for the
7variable model and smaller than those of the 11variable model.
Taking into account that the latter model uses 11 variables and
the geographic model employs just 4, the geographic model was
judged best because it was simpler and understandable. The
independent variables used in it will be available for future
revisions. The reason the 4variable geographic model performed
so well is attributed to the model’s high explanatory power
for selected variables within the 11variable model. For example,
table 3 contains the 6 of these 11
variables with the largest percent R^{2} obtained when
each census variable was modeled by the set of variables in the
geographic model. County 1990 census data for the 48 contiguous
States were used in this analysis.
Another consideration
when choosing stratification variables is the resulting expected
overlap (the expected number of old sampling units in the new
design). The 1987 geographic sample contained 45 sampling units
that were eligible for reselection as part of the new sample of
46 B/C sampling units. Of these, two (Buffalo and New Orleans)
were former class A sampling units that were no longer
selfrepresenting in the new design. Subject to the requirement
of obtaining a statistically representative sample, choosing a
stratification that will increase the expected number of
reselected sampling units avoids unnecessary training and other
personnel costs. Because one sampling unit is selected from each
stratum, the expected overlap can be computed once the
stratification has been completed. Several stratifications of the
metropolitan nonselfrepresenting regional sampling units were
completed using the variables in these models with various
weights on the variables.^{15} Table 4 exhibits the expected numbers of
overlap sampling units found in the best of these stratifications
using approximate definitions of Metropolitan Areas.
As shown in the third
column (7variable/unequal) of table 4,
the stratification using the seven 1987 revision variables along
with their 1987 weights and percent urban with a weight of 1 gave
the largest and, thus, the most desirable expected overlap.^{16} The second column of the table
(7variable/equal) is the overlap expected when using the same
variables with equal weights. The fourth column
(geographical/equal) is the expected overlap when stratifying
with the geographic variables with equal weights. The last column
(mixed/equal) shows the results of a mixed stratification scheme
with equal weights.
The last row in table 4 shows the range of the possible
number of overlap class B/C sampling units for each set of
(weighted) stratifying variables. Note that after stratification,
BLS "Keyfitzed"^{17} each
sampling unit’s probability of selection from a B/C stratum
to improve the possibility that a current sampling unit in the
stratum would be reselected, while reflecting shifts in sampling
unit populations between censuses. For example, if a 1998
revision stratum contains the same sampling units as a 1987
stratum and a current sampling unit in that stratum has a
probability of selection (1990 sampling unit population divided
by 1990 stratum population) which is greater than or equal to its
1987 probability of selection, then its Keyfitzed probability of
being selected from that stratum is 1 and it is selected with
certainty.
The final solution
was to use the variables in the geographic model for
stratification of the B/C sampling units in the Northeast, West,
and Midwest, and also for all of the nonmetropolitan sampling
units. The seven variables (with equal weights) used for the
previous revision along with percent urban were employed to
stratify the South B/C sampling units, because too much overlap
would have been lost otherwise. This is the mixed stratification
and expected overlap in the last column of table
4.
There are several
advantages to using the four geographic variables for
stratification. The variables will not change very much over
time. This will lead to much better overlap values in the next
revision, as the stratifications will be basically the same. In
addition, a complete change in stratifying variables will
eventually have to be made because census 2000 will probably not
collect data necessary to construct the 1987 variables, but the
geographic variables will definitely be available for the next
CPI revision from the ATLASGIS software. The program used to do
the stratifications is a modified version of the FriedmanRubin^{18}^{}clustering algorithm
which puts sampling units in the same strata based on their
similarities on the stratification variables, while keeping the
population sizes of the strata approximately equal.
Stratification results. For each of the
eight census regional size classes of nonselfrepresenting
sampling units (B/C and nonmetropolitan), 20 stratifications were
completed. In each class, the final stratification was
characterized by possessing the smallest sum of between sampling
unit within strata variances over all stratifying variables. This
number measures how close the sampling units in each strata are
with regard to their values on the stratifying variables.
The distribution of
the number of sampling units in each final regional B/C stratum
is fairly uniform with strata containing two sampling units being
made up of either two formerly Bsized sampling units or a
formerly Asized sampling unit and a formerly Csized sampling
unit. The B/C strata containing the larger number of sampling
units are made up entirely of formerly Csized sampling units.
The expected total overlap among the B/C sampling units ranges
between 19 and 23.
Select sampling units
A program was used to select one sampling unit per stratum so
that the selected CPIT sampling units are well distributed over
the States and that there are many current sampling units among
the newly selected ones. When the decision to publish only a
CPIU was made, the previously outlined strategy was implemented.
This resulted in designating selected nonmetropolitan areas as Z
sampling units which had urban population in their strata. To
account for the Z strata urban population in the CPIU
publication indexes, each selected Z sampling unit containing
urban population was paired with a chosen geographically close D
sample unit (B/C sample unit in the Northeast) in the same
region. The urban stratum population of each Z sampling unit was
then added to the stratum population of its paired CPIU sampling
unit to calculate the CPIU population represented by each D (B/C
in the Northeast) sample unit in the pair. These population
numbers are used to calculate the percent of index population
shown in Appendix 2.
Of the 46 final B/C
strata, 32 contained at least one sampling unit from the current
sample. A current sampling unit was selected in 21 of these 32
strata; that is, the amount of overlap in the new CPIU
nonselfrepresenting sample is 21 sampling units. The map in exhibit 3 shows all counties contained in
the contiguous U. S. (Honolulu and Anchorage are not shown) CPIU
sample by size class.
Appendix 2 (for
Census regions  Northeast, Midwest, South
and West) shows the names of sampling
units selected for the 1998 revised CPIU and counties contained
therein. The sample contains 36 new sampling units: 1
in class A (Phoenix), 25 in class B/C and 10 in class D. Prices
from these 36 sampling units will be introduced into CPI index
calculations with the release of the January 1998 index. The
appendix also gives the percent of the CPIU population
represented by each selected sampling unit along with its pricing
cycle.
Exhibit 1. Size
classifications of sampling units in CPI and Consumer
Expenditure Surveys, 1987, 1996, and 1998

Sampling
unit 
1987
CPIU and Consumer Expenditure Survey ^{1}

1996
Consumer Expenditure Survey (CPIT) 
1998
revision CPIU^{2}

Class 
Definition 
Class 
Definition 
Class 
Definition 
Selfrepresenting
metropolitan 
A 
Metropolitan Areas with 1980
population greater than 1.2 million^{3}

A 
Metropolitan Areas with 1990
population greater than 1.5 million^{3}

A 
Metropolitan Areas with 1990
population greater than 1.5 million^{3}

Nonselfrepresenting
metropolitan 
B 
Medium Metropolitan Areas^{4}

B/C 
Metropolitan
Areas with 1990 population of 1.5 million or less 
B/C 
Metropolitan
Areas with 1990 population of 1.5 million or less 
C 
Small Metropolitan Areas^{4}

Nonmetropolitan 
D 
(Urban only) 
Y
and
Z 
Represent total
nonmetropolitan population 
D 
Represent urban
nonmetropolitan population 
T 
(Rural only) Consumer
Expenditure Survey only 
^{1}
Current class B publication indexes include prices from
the class B sampling units and Honolulu, while the
current class C publication indexes include prices from
the class C sampling units and Anchorage. ^{2} The basic publication index names and
composition for the 1998 revision are shown in Appendix
2. The West B/C class index will include all B/C sampling
units in the West, along with Honolulu and Anchorage. ^{3} Anchorage and Honolulu are class A
sampling units with smaller populations. ^{4} For the 1987 revision, classes B and C
population size boundaries vary by census region. 
Exhibit
2. Metropolitan and nonmetropolitan areas in
the contiguous United States, December 1992


Exhibit
3. Class size of selected
CPIU primary sampling units in the
continental United States, 1998


Table
1. Regional distribution of selected
sample units, 1998 revision

Size class 
Total 
Northeast 
Midwest 
South 
West 
Total, CPIU 
87 
14 
22 
33 
18 
A 
31 
6 
8 
7 
10 
B/C 
46 
8 
10 
22 
6 
D (Y for CES) 
10 
0 
4 
4 
2 
Total, CES 
105 
18 
26 
41 
20 
Z (CES only) 
18 
4 
4 
8 
2 
Note: CES = Consumer
Expenditure Survey. 
Table
2. Percent price change variance
explained by models

Interval of
price change 
Geographical
(4variable) model 
7variable
model 
11variable model 
6 months 
40.23 
34.28 
47.69 
1 year 
28.66 
21.07 
28.89 
2 years 
46.26 
30.22 
65.38 
3 years 
53.01 
24.73 
66.31 
4 years 
63.01 
44.91 
79.15 
5 years 
68.97 
53.37 
83.71 
Table 3.
Percent variance of some census variables in the 11
variable model, explained by the variables in the
geographic model

Census variable 
Percent variance
explained 
Percent fuel oil heated housing units 
81.34 
Percent gas heated housing units 
70.47 
Mean contract rent 
54.01 
Percent electric heated housing units 
47.20 
Percent two or more wage earner consumer
units 
39.82 
Percent black consumer units 
39.09 
Table 4. Expected
overlap using various stratifying variables with equal
and unequal weights for class B/C sampling units, by
region

Region 
7variable/
equal 
7variable unequal 
Geographical/
equal 
Mixed/
equal 
United States 
20.07 
21.44 
18.22 
20.43 
Northeast 
3.89 
4.70 
4.60 
4.60 
Midwest 
3.44 
3.78 
2.91 
2.91 
South 
10.17 
10.30 
7.96 
10.17 
West 
2.57 
2.66 
2.75 
2.75 
U.S. range 
1822 
1923 
1519 
1822 
Acknowledgment:
The author thanks John Greenlees, Marybeth Tschetter and
members of the CPI Survey Research and Analysis Branch of the
Prices Statistical Methods Division who contributed to the final
versions of this article and Appendix 2. In particular, David
Swanson coordinated the final editing of the electronic versions
of both this article and Appendix 2 and William Johnson created
the printed and electronic versions of this article's map.
Footnotes
^{1} Each of the decennial
censusbased Metropolitan Areas is either a Metropolitan
Statistical Area, Primary Metropolitan Statistical Area, or
Consolidated Metropolitan Statistical Area. For more information,
see the Statistical Policy Office of the Office of Management and
Budget (OMB) Attachments to OMB Bulletin No. 93–05, Metropolitan
Areas 1992, Lists I–IV. The CPI metropolitan area
includes all OMBdesignated Metropolitan Areas.
^{2} The five sampling units in the
metropolitan area that are not OMBdesignated Metropolitan Areas
are the Los Angeles suburbs, CA, sampling unit, the three
sampling units that together form the New YorkNorthern New
JerseyLong Island, NY–NJ–CT–PA publication area,
and the Washington, DC–MD–VA–WV sampling unit.
(Appendix 2 (for Census regions  Northeast,
Midwest, South
and West))
^{3} BLS also publishes the
CPIW, which covers urban wage earners and clerical workers.
^{4} A more detailed description
of the current and 1998 revision area sample selection is
contained in Cathryn S. Dippo and Curtis A. Jacobs, "Area
Sample Redesign for the Consumer Price Index," Proceedings
of the Survey Research Methods Section (American Statistical
Association, 1983), pp. 118–23; and J. L. Williams, E. F.
Brown, and G. R. Zion, "The Challenge of Redesigning the
Consumer Price Index Area Sample," Proceedings of the
Survey Research Methods Section, vol. 1 (American Statistical
Association, 1993), pp. 200–05.
^{5} In 1987, the census region
population boundaries between C and B sampling unit population
sizes were (in thousands): Northeast–500,
Midwest–360, South–450, and West–330.
^{6} This map shows the
contiguous U.S. metropolitan area. Anchorage and Honolulu are the
only Metropolitan Areas not shown.
^{7} A consumer unit consists of
one of the following: (1) all members of a particular
housing unit who are related by blood, marriage, adoption, or
some other legal arrangement, such as foster children; (2) two or
more unrelated persons living together who pool their income to
make joint expenditure decisions; or (3) a person living alone or
sharing a household with others, or living as a roomer in a
private home, lodging house, or in permanent living quarters in a
hotel or motel, but who is financially independent and is not
included in (2). A student living in universitysponsored housing
is included in the sample as a separate consumer unit.
^{8} Four sampling units of this
type are in the sample—two in the Midwest and two in the
South.
^{9} All current A sampling units
are published except those which are part of A101 (New
YorkNorthern New JerseyLong Island, NY–NJ–CT–PA)
and A421 (Los AngelesRiversideOrange County, CA). These are
published together as A101 and A421, respectively. The Office of
Management and Budget calls A101 and A421 Consolidated
Metropolitan Statistical Areas.
^{10} This decision classified two
current A sampling units, Buffalo and New Orleans, as B/C
sampling units. In addition, Phoenix, a 1987 class A sampling
unit, which was dropped in 1988 due to budget cuts, is a new
class A sampling unit. However, a Phoenix index will not be
published individually.
^{11} For the 1998 revision,
prices will be collected monthly in just three A areas—A101,
A421, and A207 (the New York, Los Angeles, and Chicago
Consolidated Metropolitan Statistical Areas).
^{12} For information on
replicates and how they are used in CPI variance calculation, see
Sylvia Leaver and Richard Valliant, "Chapter
28: Statistical Problems in Estimating the U.S.
Consumer Price Index," Business Survey Methods (New
York, John Wiley & Sons, Inc., 1993).
^{13} Values of R^{2}
always increase as more independent variables are added to a
model.
^{14} The 1987 stratifying
variables were: mean interest and dividend income per consumer
unit, mean consumer unit wage and salary income, percent housing
units heated by electricity, percent housing units heated by fuel
oil, percent owner occupied housing units, percent black consumer
units, and percent consumer units with a retired person.
^{15} The weights used for the
1987 stratification were 0.5 on each of the nonincome variables
and 1 on each of the two income variables. A variable’s
weight is used as a multiplier of a statistic calculated to judge
how close every stratum’s sampling units are on this
particular variable. These products are then summed over all of
the stratifying variables. The resulting number is used to judge
how good a particular weighted stratification is. The smaller the
number, the better the stratification. See Dippo and Jacobs,
footnote 4.
^{16} See footnote 15.
^{17} See Dippo and Jacobs,
footnote 4, for more information on this technique.
^{18} See D. Kostanich, D.
Judkins, R. Singh, and M. Schautz, "Modification of
FriedmanRubin’s Clustering Algorithm for Use in Stratified
PPS Sampling," Proceedings of the Survey Research Methods
Section (American Statistical Association, 1981), pp.
285–90.
Janet L. Williams is a branch chief in the Division of Price
Statistical Methods, Bureau of Labor Statistics.
Last Modified Date: October 16, 2001