Reasons Why Median Sale Price is Flawed

Here at Zillow, we look at the housing market in a unique way. Instead of focusing on median sale price, as many other housing market trackers do, we focus on the Zillow Home Value Index. There are several reasons why this gives a more complete picture of an area's housing market.

When the California Association of Realtors (CAR) released January 2007 median sale price data, which showed a sharp decline in home prices over the past year, CAR President William E. Brown noted, "Sales do appear to be edging up, but recent declines in the median price have been due to a lack of sales in the over $500,000 range, where funds are extremely scarce and jumbo loan rates are at near-record margins compared to conforming loan rates."

An article reporting the results noted that "large changes in local median home prices typically indicate both local home price appreciation, and often, large shifts in the composition of housing market activity. Some of the variations in median home prices for January may be exaggerated due to compositional changes in housing demand."

To get a sense for how the median sale price can be biased by the mix of sold homes and the magnitude of this bias, let's look at one specific area in California which illustrates the problem well – Orange County. In the chart below, we've taken data in Orange County in 2006 and 2007 and divided up all the homes in the county into quartiles, or four buckets divided by the Zestimate value in 2006 and with the same number of homes in each bucket:

  • The first quartile contains the 25% of homes with the lowest Zestimate values;
  • The second quartile contains the 25% of homes with Zestimate values between the 25th and 50th percentiles;
  • The third quartile contains the 25% of homes with Zestimate values between the 50th and 75th percentiles; and,
  • The fourth quartile contains the 25% of homes with the highest Zestimate values.

We've then displayed the number of homes in each quartile that sell each quarter as a percentage of all homes sold in a given quarter.

If homes that sold were a representative sample of all homes in Orange County, we'd expect to see that the bar for each quartile in a given quarter was the same height (exactly 25%). If this were what we found, then taking the median sale price from among those homes would be a good measure of the median value of all homes in the quarter, and changes in the median across quarters would correspond to changes in home values alone (and not correspond to changes in the mix of homes sold). Unfortunately, we see that the quartiles are not equal each quarter. In fact, we see that home sales were relatively higher among the lower quartiles throughout 2006 and began to shift significantly towards the higher quartiles in 2007.

The impact of this pattern is that the median sale price would be lower in 2006 simply by virtue of the fact that more low-priced homes sold in this time period than high-priced homes. Conversely, median sale prices in 2007 would be somewhat higher as a disproportionate number of higher priced homes sold.

To see the precise impact that this change in the composition of sold homes has on the median sale price, see the second chart below.

This chart shows three series of numbers:

  • The first is the simple median sale price of homes sold in each period (blue line).
  • The second (red line) is the median value of the Zestimate on a fixed date in 2006 calculated among homes that sold in each period. That is, for this series, the Zestimate for a given home is constant across time, and this home's Zestimate is used to compute the median for a given period only if the home sells in the period.  As such, variation in this line over time is a simple reflection of the change in the composition of homes selling in each period (e.g., a trend towards less or more expensive homes selling).  It essentially ignores all changes in the values of specific homes over time and instead only reflects changes in the overall mix of homes selling.
  • The third series (green line) is the Zillow Home Value Index for Orange County.

As noted in a blog post I wrote, the index essentially controls for changes in the mix of sold homes and instead reflects just the changes in actual home values. The median sale price, on the other hand, is a function of both changing home values and the changing composition of sold homes.

For ease of comparison, all three series have been normalized to an index with the base period (100) being the first quarter of 2006. Also, the gray bars at the bottom of the chart show the composition of homes sold by quartile from the previous chart.

So what do these three series of numbers tell us? First, if the composition of homes sold was constant over time, we would expect to see that the median 2006 Zestimate (red line) would be a relatively flat line. This is because this series is based on a fixed Zestimate that does not change over time. If the same types of homes sold in two different periods, the medians from those two sets of homes would be roughly the same. But, as we can see, this line is not flat. Instead, it moves in rough correspondence to the changes in composition of sold homes depicted by the gray bars below it.

From Q1 2006 to Q2 2006, we see that the distribution of sold homes shifts to the right (i.e., relatively fewer less-expensive homes sold) and the median 2006 Zestimate moves up. In Q3 2006, the distribution shifts left and we see the median 2006 Zestimate move down. As the distribution again shifts right from the end of 2006 through much of 2007, we see the median 2006 Zestimate move up again.

The Zillow Home Value Index, on the other hand, shows little correspondence to the changing mix of sold homes over time (again, because it's measuring only changes in home values, not changes in the types of homes sold). But what about the median sale price? What's striking here is how closely the median sale price tracks the median 2006 Zestimate, a strong indication that, over this time period, it's measuring more the composition of sold homes than the actual change in home values (when the latter is all we really want to measure). This is true across the time periods up until the final quarter of 2007 when the actual decline in home values (as shown by the declining home value index line) becomes substantial enough to register even in the median sale price.

What does this all mean? In short, the median sale price is generally not a very reliable indicator of home value changes over time. The reason for this shortcoming is that it conflates two independent characteristics of the housing stock – changes in home values and changes in the mix of homes sold – when we really only want to measure one characteristic – changes in home values.

The good news is that we've devised the Zillow Home Value Index in order to take care of this precise issue. By valuing all homes and statistically adjusting for homes that might be under- or over-represented in the set of sold homes in a given period, the index provides a reliable and accurate measure of changes in home values over time.

By Diane Tuman

This article was adapted from this Zillow Blog post by Zillow Vice President of Data & Analytics Stan Humphries.

