Appendix Statistical Definition and Estimation of Price Indexes
This report addresses foundational economic concepts for costofliving or price indexes. In the panel’s view, the concepts must reflect the reality of the marketplace; they must capture the change in real prices paid by real consumers. The concepts must be measurable in the context of a system of surveys and other data collection activities that the Bureau of Labor Statistics (BLS) can feasibly implement.
An important step in assessing the measurability and reality of a particular price index concept is to express the concept statistically in the form of a population parameter to be estimated. If one can write down the parameter, one can examine the feasibility of surveys and other data collection activities necessary to produce accurate statistical estimators of the parameters. One can also examine whether the parameter is defined in terms of the prices actually paid by consumers.
In what follows, we translate our concepts into explicit population parameters. We define the price indexes motivated by our concepts and demonstrate briefly the survey data required to estimate the indexes.
To begin, consider a simple world in which there is only one good and two time periods, base and comparison and a static universe of households (HH), denoted by the set H. For cases in which it would be better to work in terms of subgroups within HHs called consumer units (CU), let H denote the universe of CUs.
Next, let us introduce the bulk of the requisite notation. Let
i 
signify the HH (i = 1, . . . N), 
j 
the purchase occasion, 
J_{i}_{0} 
the set of purchase occasions by the ith HH in base period 0, 
J_{it} 
the set of purchase occasions by the ith HH in comparison period t, 
Q_{gij}_{0} 
the number of units of good g purchased by the ith HH, jth purchase occasion, during base period 0, 
Q_{gijt} 
the number of units of good g purchased by the ith HH, jth purchase occasion, during comparison period t, 
N 
the number of households in the universe 
p_{gij}_{0} 
price per unit (of good g) paid by the ith HH, jth purchase occasion, during base period 0, and 
p_{gijt} 
price per unit (of good g) paid by the ith HH, jth purchase occasion, during comparison period t. 
In these definitions, we use the convention
for nonbuyers in the base period and
for nonbuyers in the comparison period. We assume there is at least one buyer, Q_{0} > 0 and Q_{t} > 0, in each period.
Average unit volumes, and , and average prices per unit, and , are defined in the obvious way. The decomposition of the periodtoperiod trend in total dollar volume is now given by
where T_{N} is the trend in the total HH count, T_{q} is again the trend in average units per HH, and T_{p} is again the trend in average price per unit. As above, T_{p} may be called the price index and T_{q} the unit volume index.
We can next further extend the work to a still more realistic world in which a static set of goods is available in the market at both time periods. Let subscript g signify a good, and to simplify the notation let G represent both the set and the number of goods. Total dollar volumes are now defined by
for base and comparison periods, respectively. Average units volumes, and , and average prices per unit, and , are defined in the obvious way. Also, define the G × 1 vectors of average unit volumes and average prices per unit
and
Then, the periodtoperiod trend in total dollar volume is given by
where T_{N} is again the trend in the total HH count, T_{Pq} is the trend in average units per HH, and T_{Lp} is the trend in average price per unit. The trend in average units is weighted by comparison prices, and thus one might view T_{Pq} as a Paasche index of unit volume. Since the trend in average prices is weighted by base units volume, one might thus view T_{Lp} as a Laspeyres price index.
An alternative decomposition of the trend is
where T_{Lq} is a Laspeyres index of units volume and T_{Pp} is a Paasche price index. A second alternative decomposition of the trend is
where T_{Fq} and T_{Fp} are Fisher indexes of unit volume and prices, respectively.
Finally, we reach the real world in which both the sets of goods marketed, G_{0} and G_{t}, and the sets of households, H_{0} and H_{t}, vary by period. Partition the set of goods marketed at the base period by
G_{0}=G_{0}_{Q} ∪G
and partition the set of goods marketed at the comparison period by
G_{t} = G ∪ G_{tE},
where G_{0}_{Q} denotes exiting goods, G_{tE} denotes entering goods, and G denotes both continuing and linkable goods.
Continuing goods are marketed in both time periods, while exiting goods appear in the base period but not in the comparison, and entering goods appear in the comparison but not in the base. There is gray area we have called linkable goods. These are goods for which there is no exact match between the predecessor good and the successor good, but for which economic theory nevertheless accepts the link for purposes of index number construction. BLS has some linkage rules or criteria which it uses currently in producing the monthly CPI.
Periodtoperiod trend in total dollar volume is now
where
is the continuing and linkable volume as a proportion of the total base volume, and
is the continuing and linkable volume as a proportion of the total comparison volume.
Let
be the trend in the proportion continuing or linkable. Then building on the above, the trend in total dollar volume can be decomposed as
Alternative price indexes based on continuing and linkable goods are given byT_{Lp} is Laspeyres price index, T_{Pp} is the Paasche price index, and T_{Fp} is the Fisher price index. All price indexes discussed here extend naturally to a time series of comparison periods.
In the first two formulations, we faced a simple world with only one good. In this world, the price index
is both plutocratic and democratic.
In the third formulation, we faced a limited world in which a static set of goods is available in the marketplace. In this world, the plutocratic Laspeyres price index can be rewritten as
where the plutocratic weight applied to the simple trend in average price
is simply market share expressed in dollars calculated across all HHs in the base period population with respect to the total market basket, G. Given the same assumptions, the democratic Laspeyres price index is defined by
where the democratic weight applied to the simple trend in average price
is the unweighted population mean across all HHs in the base period population of the HH specific market shares. Thus, plutocratic weights are ratios of means and democratic weights are means of ratios. Similar weighting yields I_{Pp}, a democratic Paasche price index, and I_{Fp}, a democratic Fisher price index. It is straightforward to establish the following relationship between plutocratic and democratic weights:
where
is market share within the ith household,
is total consumption volume by the ith HH in the base period, and
is the population mean per HH of total consumption volume. Thus, the plutocratic weight exceeds (is exceeded by) the democratic weight for any good that displays a positive (negative) correlation between total HH consumption and HH market share. The weights are equal in the event of zero correlation. For example,
let g = automobiles. If there is positive correlation between total HH consumption and the share of HH consumption on automobiles, the plutocratic weight will exceed the democratic weight.
Across all goods, one can now conclude the following relationship between price indexes:
The difference between the price indexes is determined by the pattern of covariances and price trends across goods. If goods for which the covariance is positive experience relatively large increases in average price, plutocratic price indexes may exceed their democratic counterparts. In general, however, the direction of the difference between the price indexes is far from certain for a given comparison period, t, let alone across periods. This matter is ripe for empirical investigation.
The democratic price indexes, and the relationship between plutocratic and democratic price indexes just discussed, extend naturally to the realworld situation described above where the domain of goods varies by period.
There are at least two approaches to estimating the price indexes: household (HH) survey data and store survey data. In this section, we explore the first approach; in the next section we look at the second.
Let s_{0} and s_{t} denote probability samples drawn from the universe of HHs at times 0 and t, respectively. At each time period, assume that BLS collects unit volume and prices for all buying occasions for all goods from each HH, i, in the sample. Comprehensive data of this kind are not currently collected by any BLS survey. It might be feasible—using scanning technology or other approaches—to design surveys to collect such data.
Let W_{it} and W_{i}_{0} denote survey weights such that
and
are essentially designunbiased estimators of the totals Q_{gt} and Q_{g}_{0}, respectively. Similarly, define the estimated totals
One can estimate the price indexes:
and
Estimators for the other trends and indexes— and —are defined in the obvious way.
Next, consider the possibility of estimating the price indexes exclusively using storelevel data. Let s_{0} and s_{t} denote probability samples of stores, let the subscript k index the store, and let W_{k}_{0} and W_{kt} denote survey weights corresponding to the unbiased estimator of a population total at times 0 and t, respectively. It is easy to imagine estimators
of total dollar volume and
of total unit volume. These estimators obviously require data on prices and unit volume by good at the store level. Current BLS surveys do not collect such data, but surveys based upon scanning technology could produce these data, at least for a subset of goods in a subdomain of stores. Given and , it is possible to estimate the plutocratic price indexes.
The question is whether the price indexes estimated on the basis of store data really estimate T_{Lp}, T_{Pp}, and T_{Fp}. One would anticipate some biases due to such factors as

goods purchased from stores for business use, not for home consumption;

shrinkage due to breakage and pilferage (this component of bias would depend on the mode of data collection); and

coverage errors in the store sampling frame (i.e., missing stores in which consumers shop and including stores in which they do not).
Regrettably, it is not possible to estimate the democratic index I_{Lp} exclusively from storelevel data, at least not without additional assumptions. The democratic weights, , are population means per HH, and HH data are necessary to estimate the means unbiasedly; such data are not usually available from stores (some store chains have adopted ID card programs that allow tracking of purchases by consumer).
It may be possible to approximate the democratic index from storelevel data with periodic adjustment of the weights. This possibility exploits the relationship between plutocratic and democratic weights set forth above. From storelevel data, one can construct an estimator of the plutocratic weights
Then we define the estimator of the democratic weights as
where the adjustment factor is the second term on the right side, developed from an independent HH survey, such as the Consumer Expenditure Survey. In this factor, c(D_{gi}_{0},Y_{+}i_{0}) is an estimator of the covariance between HH share and total HH consumption, and is an estimator of mean total consumption per HH in the base period. It does not seem necessary to estimate the adjustment factor for each time period (month) the price index is produced. Perhaps it might be acceptable to maintain the adjustment factor only on an infrequent basis.
Without question, one can imagine other hybrid schemes for estimating plutocratic or democratic price indexes. BLS’s current method is an outstanding example, with quantity weights coming from one survey and monthly prices from another.