Upgrading the Zestimate

Posted by: Stan Humphries    Tags:      Posted date:  June 14, 2011  

Yesterday afternoon, we shipped Zestimates produced from an entirely new algorithm to the website.  This marks our third algorithm developed since Zillow’s launch in 2006.  These Zestimates from the new algorithm differ from the previous ones in two important ways.  First, there are a lot more them: a total of 97 million Zestimates to be precise.  Yesterday, we only had 72 million Zestimates, but new modeling approaches and better use of user-added data have allowed us to produce reliable home valuations in more areas than previously.

Secondly, these new Zestimates are more accurate than the older ones. For the three-month period ending March 31, 2011, our national median error was 8.5 percent.  The median error of our previous algorithm over the same time period was 12.7 percent, meaning that the new Zestimates are 33 percent more accurate than the older ones.  Some of the biggest gainers in terms of accuracy were the metros of Denver, Minneapolis, Phoenix, Portland, Atlanta, Philadelphia and Detroit.  See the table below for a comparison of accuracy before and after for the top metros (sorted by the improvement in median error).  Our accuracy for all counties across the country can be found here.

We hope you enjoy the new Zestimates. As with our previous two algorithms, now we’ll start tweaking the algorithm to fix all the little issues that Zillow users find and researching entirely new approaches that will, in time, become version number four of our valuation algorithm. The progress continues.

Measuring our accuracy

Our core metric for measuring accuracy is median absolute percent error, which measures the percent difference between the Zestimate and the actual sale.  For computing accuracy metrics, we look at transactions over a three-month window of time and pair each transaction in that time period with the Zestimate value generated for that home immediately prior to the sale date.  For example, a home sells on February 15, 2011, and we’ll pair the sale with a valuation produced on February 12. In addition to median error, we also report the percent of transactions in the time period that are within 5%, 10% and 20% of the actual sale price. Since so much of Zillow’s mission is about increasing the transparency of data in the real estate market, we apply this same rule to our own data and, by doing so, help consumers and professionals better understand the performance of our models across all geographies.

Historical Zestimates

With the data delivery yesterday, we have re-computed historical Zestimates back to January 2006. There are a few algorithmic developments that we are still working on and, once these are done, we intend to re-compute all Zestimates back to 1996. We have only re-computed Zestimates nationally once before, with our second version of the algorithm released in January 2008.

Implementation and Deployment

As with Rent Zestimates, which we released in March, our new Zestimate algorithm is designed to run routinely on Amazon Web Services Elastic Compute Cloud machines. The old algorithm utilized more than 500,000 statistical and machine learning models whereas the new algorithm generates more than 1.2 million models.  Data derived from 3.2 terabytes of data – the equivalent to more than a third of the entire printed collection of the Library of Congress – go into these models which are created, used and then thrown away before the next process three times a week when we create new Zestimates across our entire data footprint.


About the author
Stan Humphries
Stan is Zillow's Chief Economist. To learn more about Stan, click here.