- Find a Real Estate Professional
- Realtors®
- Mortgage Lenders
- Home Improvement Pros
- Other Real Estate Services
- Review an Agent, Lender or Pro
- Marketing on Zillow
- Real Estate Agent Advertising
- Join the Professional Directory
- Popular
- Real Estate Market Reports
- More
Answers (4)
Best Answer

- Pasadenan
- Contributions:21426
Zillow has altered the methodology at least 5 times since they have been doing Estimates; for one major modification, it was so major that they recalculated the entire estimating history for continuity.
They make minor tweaks to the method as they find solutions to issues where they are not getting as good a curve fit as they want.
As for "studies"; I downloaded the "for sale" data for my city and plotted the % difference of Zestimate to LISTING price as a function of LISTING price, with different colors for estimated too high (green) and estimated too low (red). The "average" for under $2 million listings is only 1.5% difference (too low), but the median sales price in the area is about 3% lower than the median list price, so the curve fit is "not bad" for the under $2 million range; but is consistently under estimated in the above $2 million range, mostly as there is not sufficient sales in the above $2 million range to properly weight the factors for these higher value homes.

Web address for full size image to read the numbers:
http://photos3.zillow.com/is/image/i0/i5/i7519/IS1ht1e6q21e0f7.jpg
Enlarged; for the ≤$1 million listings, ±40% difference range:

web address for full sized image:
http://photos3.zillow.com/is/image/i0/i5/i7521/ISg9mmsr5txjmr.jpg
When I started clicking on some of the ones that stated Zillow was 40% or more high, guess what? As expected, they state "fixers", and they certainly look like it. Not to mention, various neighbor problems as well, visible from the aerial views.
A curve fit is a curve fit and can't take into account changes in condition over time.
They make minor tweaks to the method as they find solutions to issues where they are not getting as good a curve fit as they want.
As for "studies"; I downloaded the "for sale" data for my city and plotted the % difference of Zestimate to LISTING price as a function of LISTING price, with different colors for estimated too high (green) and estimated too low (red). The "average" for under $2 million listings is only 1.5% difference (too low), but the median sales price in the area is about 3% lower than the median list price, so the curve fit is "not bad" for the under $2 million range; but is consistently under estimated in the above $2 million range, mostly as there is not sufficient sales in the above $2 million range to properly weight the factors for these higher value homes.

Web address for full size image to read the numbers:
http://photos3.zillow.com/is/image/i0/i5/i7519/IS1ht1e6q21e0f7.jpg
Enlarged; for the ≤$1 million listings, ±40% difference range:

web address for full sized image:
http://photos3.zillow.com/is/image/i0/i5/i7521/ISg9mmsr5txjmr.jpg
When I started clicking on some of the ones that stated Zillow was 40% or more high, guess what? As expected, they state "fixers", and they certainly look like it. Not to mention, various neighbor problems as well, visible from the aerial views.
A curve fit is a curve fit and can't take into account changes in condition over time.

- Stan Humphries, "Stan Humphries"
- Contributions:50
Hi jrtpapa. We've had two major versions of the Zestimate algorithm now with the third to debut in the coming months (don't worry, we'll let you know when it hits the site). As Pasadenan said, we've re-estimated complete history once previously (and will do so again) and make periodic tweaks to the base algorithm, sometimes globally, sometimes in just a few counties.
The models are self-learning and designed to correct for systematic error (or what you refer to as "consistent bias") so, in general, one should find the pattern seen in Pasadenan's charts, assuming one is looking at a sufficiently large sample of homes. Specifically that means actual sales that are equally above and below their estimated value. When looking at a smaller region of homes or a narrow, non-random subset of homes, it is conceivable that one will find a biased pattern such as you report.
As wetdawgs notes, we regularly update our accuracy stats but these only contain information on absolute error (how far from the actual sale price the estimate is regardless of the direction of the error), not raw error (which indicates systematic error). Since the latter is nominally zero and is fairly confusing to most people (not anyone on this current thread apparently), it is not reported.
While we don't often discuss too many details about our models, I can't help but reveal that linear regression is not among the techniques we utilize and almost all of our models are trained at the sub-county level (and we use a lot of them; 334,000 created each night while producing Zestimates).
Hope this helps.
The models are self-learning and designed to correct for systematic error (or what you refer to as "consistent bias") so, in general, one should find the pattern seen in Pasadenan's charts, assuming one is looking at a sufficiently large sample of homes. Specifically that means actual sales that are equally above and below their estimated value. When looking at a smaller region of homes or a narrow, non-random subset of homes, it is conceivable that one will find a biased pattern such as you report.
As wetdawgs notes, we regularly update our accuracy stats but these only contain information on absolute error (how far from the actual sale price the estimate is regardless of the direction of the error), not raw error (which indicates systematic error). Since the latter is nominally zero and is fairly confusing to most people (not anyone on this current thread apparently), it is not reported.
While we don't often discuss too many details about our models, I can't help but reveal that linear regression is not among the techniques we utilize and almost all of our models are trained at the sub-county level (and we use a lot of them; 334,000 created each night while producing Zestimates).
Hope this helps.

- Pasadenan
- Contributions:21426
I know you didn't ask the method; but it is "Multiple Linear Regression" of the 9 county records data items to the "recently sold" prices within a given distance of the property; more distance for rural areas, less distance for high density urban areas.
After calculating the coefficients, they test the resulting curve fit against the recently sold prices, and if some are way out of line, they eliminate those properties, and recalculated the coefficients. They also indicated they exclude foreclosure sales data. The new equation is then used to calculate the estimated values. Areas that they "tweak" without telling anyone is the specific terms they are using for the curve fit, and whether they are using higher order terms... Another area they tweak is rules for converting a tax assessed value to something that can be used in the curve fit based on local property tax rules. Another area they tweak is how they carry a prior sale value forward based on date of sale for use in the curve fitting program as inflation rate is not at all a flat line.
And as previously mentioned, in the under $2 million range, the resulting estimate is just as likely to be too high as it is to be too low, mostly based on changes of condition of the property, or differences in things like ceiling height and views and insulation, even though those items are partially addressed by the last sale price & date, and by the tax assessed value.
One area they are trying to look into tweaking is how to deal with remodels and rebuilds when those do not change the purchased price nor do they fully change the tax assessed value in many areas.
After calculating the coefficients, they test the resulting curve fit against the recently sold prices, and if some are way out of line, they eliminate those properties, and recalculated the coefficients. They also indicated they exclude foreclosure sales data. The new equation is then used to calculate the estimated values. Areas that they "tweak" without telling anyone is the specific terms they are using for the curve fit, and whether they are using higher order terms... Another area they tweak is rules for converting a tax assessed value to something that can be used in the curve fit based on local property tax rules. Another area they tweak is how they carry a prior sale value forward based on date of sale for use in the curve fitting program as inflation rate is not at all a flat line.
And as previously mentioned, in the under $2 million range, the resulting estimate is just as likely to be too high as it is to be too low, mostly based on changes of condition of the property, or differences in things like ceiling height and views and insulation, even though those items are partially addressed by the last sale price & date, and by the tax assessed value.
One area they are trying to look into tweaking is how to deal with remodels and rebuilds when those do not change the purchased price nor do they fully change the tax assessed value in many areas.

- wetdawgs
- Contributions:26784
Zillow (on this site) has a large selection of statistics on Zestimate vs actual sales prices for various communities. Most communities have about 70% of the houses selling within (+/- 20% ) of the zestimates.
Sorry, hopefully Zillow will comment on how often the formula is tweaked.


Research on the accuracy of Zestimates?
Stating a discriminatory preference in an advertisement for housing is illegal. If you think this content is discriminatory or otherwise inappropriate and feel it should be removed from Zillow, please let us know by completing the information above.
We will review this content. Thanks for helping make the site more useful to everyone. To learn more, read Zillow's Good Neighbor Policy.