Hedonic Quality Adjustment Methods for College Textbooks in the U.S. CPI
The College textbooks item category has been in the Consumer Price Index (CPI)
since 1964. College textbooks, along with Elementary/high school textbooks,
and Reference books make up the Consumer Price Index aggregate index Educational
books and supplies. Educational books and supplies had a relative importance of
.196 in the CPI-U All Items, as of December 1999. The Educational books and supplies
index is published monthly at the U.S. level, while none of the aggregate's three
components are published. College textbooks is by far the largest component.
includes any book, which, according to the outlet, has been
designated by the college, department, or professor, as a required text for a course
offered by the college during the academic period. Only new books are priced. Used books
are included in the item category's weight, but excluded from pricing to avoid the
difficulty finding comparable items over time.
has a relatively high number of replacements (which occur when
the book that has been followed is no longer sold in the outlet) and in many cases the
replacement is not comparable to its predecessor. For example, over the one year time
period from June 1998 to May 1999, the CPI priced a total of 948 quotes for the College
category. From this full year of quotes, 113 quotes (12%) were replacements.
Of the 113 replacements, 40 quotes (35%) were deemed to be either comparable or able to be
quality adjusted, and thus could be used in the CPI. The remaining 73 quotes (65%) were
not comparable, and were deemed to be eligible for other processing where estimated price
change is used based on price movement of comparable replacement items. Ultimately, this
meant that 1 out of every 13 priced quotes in this item category over the course of a year
were non-comparable replacements. These figures led to the conclusion that College
textbooks more than qualified as a candidate for hedonic regression analysis.
The idea behind hedonic models is that items can be thought of as bundles of item
characteristics. Hedonic models estimate values for the individual characteristics of a
good or a service.1 Parameter values from hedonic models can be used by CPI
analysts to calculate changes in quality between two items with different characteristics.
Most hedonics studies have historically been performed on items such as apparel and
appliances, where the purchaser determines the demand for the product. The demand for
college textbooks is not determined by the bookstores or the college students that
purchase books. Instead, demand is determined by third persons within the college, such as
Data for this study were from the March 1999 CPI data base. There were 338
observations. The CPI defined each item category with a checklist. The checklist
identified specifications that define the various items that fall in the category. The
checklist specifications correspond to the characteristics that the hedonics need. Even
though the College textbook checklist is not lengthy, a number of important specifications
were isolated. 'Price' was used as the dependent variable in the original regression
models. The independent variables selected were as follows:
- Book cover — Separate variables were created for hard book covers and soft book covers.
- Degree status of intended student and course — Individual variables were developed for
both graduate and undergraduate courses.
- Book size — Six different variables needed to be set up for different sizes of books.
The book sizes recognized (in inches) were 8.5 X 11, 8.5 X 9.5, 8.5 X 5.5, 7 X 10, 6 X 9,
- Book pages — Seven different variables were created for different book lengths. The
page categories listed were (a) under 150, (b) 151 - 300, (c) 301 - 500, (d) 501 - 800,
(e) 801 - 1150, (f) 1151 - 1500, and (g) over 1500.
- Course subjects — Numerous variables were developed for different course subject
matters. In some cases, related subject matters had to be combined to form broader
categories. This is the list of course variables that were set up:
- Book features — Some books have special features or extra items included. Three
different variables were set up to account for these areas.
- (1) Book has spiral binder feature
- (2) Book has color plate(s) feature
- (3) Book includes workbook, study guide, kit or disk
- Individual major publisher — Textbook publishers are important to this model. Studies
have shown that approximately 65% of the fees paid for college textbooks are retained by
the publishers.2 The possibility exists that part of the price of a textbook
may be tied up in publisher name recognition. However, the quotes in this item category
list far too many publishers to create an individual variable for each one. Eight of the
larger, more well-known educational publishers that priced eight or more books on the data
base were isolated as individual major publisher variables. All other publishers were
grouped in a 'Not-major publisher' variable.
- Separate variables were also considered for size of city and CPI region of the country
where each bookstore can be found.
- Separate variables were established for the type of business (university bookstore,
community college bookstore, independent bookstore contracted by the college, etc.) that
was recorded for each quote.
A number of programs were run to clean-up the college textbook data in preparation to
run the regressions. During this process, a total of 123 quotes were deleted from
consideration, reducing the usable sample from 338 to 215 quotes. The remaining quotes
were deemed to be more than adequate to build the regressions.
Starting the Regressions
The first models were run using the linear equation. The reference or base variables
(dummy variables left out to avoid overdetermining the model) were soft cover book,
undergraduate student, 8.5 X 11 book size, under 150 pages length, music subject, no
special features/extra items included, not-major publisher, city size C (smallest cities),
region 3 (south), and business type #1 (full price department store).
|R-Square = 0.7853
||Adj R-Square = 0.7146
|8.5 X 9.5
|8.5 X 5.5
|7 X 10
|6 X 9
|150 - 300 pages
|301 - 500 pages
|501 - 800 pages
|801 - 1150 pages
|1151 - 1500 pages
|Over 1500 pages
Model One Analysis
The first linear model had some encouraging results. The R-Square of .78 and the
adjusted R-Square of .71 meant that three-quarters of the dependent variable 'Price' had
been explained. The intercept parameter estimate seemed high at $26, especially since many
books on the data base had a retail price of $15 to $30. However, this was more easily
understood when viewing the independent variables. The preferred T-Values were figures
above 2.0, while the preferred confidence intervals were figures under .05.
The variables for cover type, degree status, and number of pages modeled reasonably on
the whole. Hard cover books are more expensive than soft cover books. Model One bore out
this fact with the parameter estimate that hard cover books are $16 more expensive than
soft cover books. The Degree status variable yielded borderline, yet surprising results.
Most people intuitively believe that graduate level books cost more than undergraduate
books. Model One listed graduate books as slightly more than $6 cheaper than undergraduate
books. There were some successes with the number of pages variables. The statistics with
higher significance belonged to the middle variables representing book sizes from 300 to
1500 pages. The variable for the shortest books had poorer T-statistics, but not poor
enough to justify its deletion. The variable for the longest books possessed excellent
statistics, however contrary to expectations it possessed a lower parameter value than the
variable for the next smaller group of books.
The different course subject variables yielded mixed results. Some possessed
significant statistics, while others did not. The seemingly surprising fact that the sign
for around one-third of the subjects was negative simply meant that the excluded subject
(music) did not represent the books that were the cheapest on average. This also explained
the high intercept value of $26, since a $20 book could be explained by taking the $26
intercept and subtracting one of the negative course subject parameter values.
None of the eight variables for the individual major educational publishers tested as
significant. However, since publishers receive a large chunk of textbook revenue, these
variables should be considered further. Possibly combining these publishers can produce
A number of variables tested poorly enough to consider deletion. Most of the book size
variables did not model very well. The majority of them did not test as significant. The
8.5 X 9.5 size variable possessed a positive sign contrary to expectations, plus the
parameter estimates did not relate well to each other compared with each book size. The
variables for book features/extra items fared poorly, which was surprising. Intuitively,
one would think that providing more features with the product would add product value. The
variables for city size, region of the country, and business type modeled poorly. All of
these variables were deleted in the next iteration, removing them from further
Linear Model Modifications
Many changes were made to the input data from Model One in order to make improvements.
The following are the bulk of the changes that were made to ultimately arrive at the
successful Model Two:
Some combinations were created with the number of pages variables that had borderline
performances in Model One. The variables for under 150 pages, and 151 to 300 pages were
combined to form a variable for 300 or less pages. Also, the variables for 1151 to 1500
pages and over 1500 pages were combined to form a variable that covered any book over 1150
pages. The other three number of pages variables were allowed to remain as they were.
The eight individual major publisher variables were all combined together to form one
large major publisher variable. The hope was that the major publishers as a unified group
variable can produce significant results.
One of the first changes for the course subjects variables was to change the excluded
variable in the model. Model One seemed to indicate that History books were, on average,
the cheapest. Changing the reference variable to History books allowed the remaining
course subject variables to all display positive values, as well as significantly lowering
the intercept parameter estimate. The mixed performances from the course subjects
variables led to combining of related variables to form broader variable categories —
BUSACCMP = Business, Accounting, & Computers combined.
BIOCHPHY = Biology, Chemistry, & Physics/Physical Science combined.
GENERCLS = All other subject variables combined; exceptions noted in the following.
NOTE — Calculus, Engineering, English, History & Mathematics were allowed to
remain as individual variables.
NOTE — Other variable combinations, such as combining Psychology and Sociology, were
tried. None produced significant results.
|R-Square = 0.7373
||Adj R-Square = 0.7189
|301 - 500 Pages
|501 - 800 Pages
|801 - 1050 Pages
|1051 or more Pages
Excluded variables: Soft cover, Undergraduate student, Under 300 pages, History,
This model performed well. The R_Square value of almost .74 and the adjusted
R_Square value of almost .72 again indicated that around three-fourths of the dependent
variable 'Price' have been explained. The signs for all of the independent variables
appeared to be correct, and all of the parameter estimates appeared to be acceptable.
The T-Values and confidence levels for almost all of the independent variables were
excellent. Among the exceptions, the statistics for the Intercept were marginal and the
statistics for English were poor. In this case, the English variable was allowed to stand
since no other variable seemed appropriate to combine with English.
One surprising factor was that the Calculus variable was able to stand on its own in
all models, including Model Two. This subject seemed to defy a combination with areas such
as algebra and geometry in order to form a comprehensive Mathematics variable. Today, most
calculus books are deemed to be expensive, and they seem to retain reasonable value when
students resell them as used books. On the other hand, the cleaned data base used for this
study had no more than five quotes where calculus was priced. Therefore, the Calculus
variable may need supplemental data in future studies.
Model Two Testing
Two types of testing were planned for Model Two. Applications for both types of testing
are presented here.
- The first testing involved insertion of model parameter values to determine book prices
using college textbook descriptions from the CPI data base. The calculated prices were
then compared with actual data base textbook prices. In the majority of cases the Model
Two predicted values produced estimates that deviated less than 15% from the data base
prices. Most of the exceptions were noted on lower priced books, especially those with a
retail price below $20.
- The second type of testing was carried out over a period of five months. As
non-comparable replacements arrived in the CPI office, Model Two parameter values were
applied to the quotes to see if the parameter estimates could be used to quality
adjustment the quotes. After this testing period, a summary of the testing revealed that
Model Two was able to be successfully used to quality adjust approximately 85% of the
non-comparable replacements. The majority of the exceptions, again, were textbooks priced
One example from this testing was a non-comparable replacement involving the
replacement of a college English book. The replacement book provided a price increase of
24.5% for the quote. Application of parameter estimates from Model Two reduced the quote
price increase to 4.4%.
Additional Model Two Research
The largest concern from the Model Two textbook testing was that quotes with lower
prices (under $20) at times were not performing well. Since number of pages seemed to be a
very important price factor, a graph was created to plot the price for each textbook
compared to the corresponding page range for each textbook. The theory was that lower
priced textbooks could conceivably require a different model than Model Two.
The above graph
was created and analyzed. The numbers for 'pages' in this graph corresponded to the page
range variables as follows:
1 = under 150 pages
2 = 151 - 300 pages
3 = 301 - 500 pages
4 = 501 - 800 pages
5 = 801 - 1150 pages
6 = 1151 - 1500 pages
7 = over 1500 pages
The plot did not seem to indicate that a different model was needed for lower priced
textbooks. Surprisingly, the price for books with more pages seemed to steadily rise with
the page range until around 1000 to 1100 pages. At that point, there seemed to exist a
condition of 'diminishing returns' where more pages did not cost more money, and in many
cases actually cost less.
In order to more fully understand this situation, a correlation matrix was created that
examined data for the Model Two variables in correlation with each other. The correlation
data seemed to suggest that the above plot could be explained by the relationship of
different book subjects to the number of pages and the book price.
More expensive books, such as calculus and other math books, did not necessarily have a
large number of pages. Many of these books were located in the middle page ranges. On the
other hand, many of the lower priced textbooks, such as English books, had well over 1000
Improvements for lower priced books could potentially be realized by inserting the
Model Two data in a format that utilizes a log-linear equation. A third model was created
using the same dependent variables that had been used in Model Two. In Model Three the log
of price was used for the dependent variable, which established this as a log-linear
|R-Square = 0.7158
||Adj R-Square = 0.6959
|301 - 500 Pages
|501 - 800 Pages
|801 - 1050 Pages
|1051 or more Pages
Excluded variables: Soft cover, Undergraduate student, Under 300 pages, History,
This model also seemed to perform well. The R_Square of almost .72 was quite similar to
the R_Square value for Model Two. The T-Values and confidence levels for most of the
independent variables were excellent as well. Unfortunately, testing led to Model Three's
Model Three was tested by inserting model parameter values to determine book prices
using college textbook descriptions. Some of the same quotes from the data base that had
been used to test Model Two were used to test Model Three. Quotes were used that possessed
a wide range of textbook prices. In approximately 80% of the quotes tested the linear
model provided a better prediction of actual quote price than the log-linear model.
To list an example, Model Two and Model Three parameter values were applied to the
textbook description for one particular mathematics book on the CPI data base. The
following were the results:
Actual quote price = $75.00
Predicted value using Model Two = $80.27
Predicted value using Model Three = $82.85
Therefore, the conclusion has been drawn that the Model Two linear model has done a
better job of modeling the college textbook entry level item.
Model Two Usage in CPI
This researcher recommended that CPI management approve the Model Two linear model for
quality adjustment usage in College textbooks. In the Spring of 2000, the CPI
program management approved this model for use in quality adjusting College textbook
quotes. Announcements have been made to the public beginning in April of 2000. Model usage
is scheduled to begin with the CPI for July 2000, in time for textbook replacements in
Autumn of 2000.
Hedonic Study Verification
In keeping with the CPI requirement that all hedonic studies must be verified by other
personnel skilled in the use of hedonics, this college textbook hedonic study was
reviewed and approved by staff from BLS Price Index Number Research Division.3
Data from this study have also been reviewed by the CPI Hedonics Team.
Further hedonics research is planned for college textbooks. Experimental index
calculations are planned using the data for the upcoming Autumn college textbook
quotes. Also, another hedonics study using data from a newer CPI data base is planned
within the next year.
(1)"The Use of Hedonic Regressions to Handle Quality Change: The Experience in the
U.S. CPI;" by Dennis Fixler, Charles Fortuna, John Greenlees, and Walter Lane,
presented at the Fifth Meeting of the International Working Group on Price Indices; August
(2)"Where the 'New' Textbook Dollar Goes",
www.nacs.org/public/research/higher_ed_retail.asp. These data reveal 75.9 cents of every
dollar students pay for textbooks will be received by the publishers. From this, 11.5
cents will be forwarded to the authors of the textbooks. This leaves 64.4 cents out of
every textbook dollar (or almost 65%) that will be retained by the publishers.
(3)The author wishes to thank Mary Kokoski of the BLS Price Index Number Research
Division for all of her contributions to the success of this project.
Last Modified Date: October 16, 2001