Zillow Home Value Forecast: Methodology

Posted by: Andrew Bruce    Tags:      Posted date:  January 24, 2013  

Zillow forecasts the change in the Zillow Home Value Index (ZHVI) over the next 12 months, called the Zillow Home Value Forecast. The ZHVI itself is a time series tracking the monthly median home value in a particular geographical region, and the methodology behind the index is described in this research brief.  This research brief describes the forecasting models used to estimate the Zillow Home Value Forecast.

Geographic Coverage of Forecasts

Zillow forecasts most geographic levels for which the ZHVI is available, including core based statistical areas (CBSAs), states, cities, neighborhoods and ZIP codes.  At present (December 2012), this includes coverage for:

  • 259 CBSAs
  • 41 states
  • 560 counties
  • 8,407 cities
  • 5,149 neighborhoods
  • 10,264 ZIP codes

The precise number of geographies changes from month to month because some regions are added and/or dropped depending on the availability of timely and reliable data.

Historical Forecast Accuracy

The predictive accuracy of the one-year forecasts was assessed by back-testing the model over the past six years.  Back-testing consists of running consecutive forecasts on historical data where the forecasted value is produced from out-of-sample data.   The above table summarizes the average median percentage error of the 12-month forecast for the various geographic regions.  The average is taken over all regions within a region type.  This forecast error is compared to a naïve forecast based on a simple random walk model.  The most accurate forecast, and the biggest improvement over a naïve forecast, is for CBSA regions.  This is largely because the most complete economic data is available for those regions.

Core Model: CBSA

The core forecasting model is at the CBSA geographic level.  The CBSA forecasts are an ensemble estimate formed by combining the estimates from a univariate time series model with an economic leading indicator model.  The time series model captures the unique temporal movements of each series whereas the economic model captures broader movements in the economy.  By combining the two approaches, we get better performance than from each model individually.  In addition to combining the two different modeling approaches, the ensemble forecast also includes the estimates from the prior quarter.  This results in more stable forecasts that change less from quarter to quarter.

Ensemble Estimation Using Model Stacking

At each time point t, we have M forecast models   with .  For example, our current forecast has two completely different models:  a univariate time series model and a leading indicator model.  A generic and simple way to combine models is through “stacking.”  The stacked estimate is

 

 

 

 

where the weights are constrained to be position .  This equation can be solved using a quadratic program.

As indicated above, at each quarter, we have four different candidate models:

  1. A leading indicator economic model for the current quarter.
  2. A univariate time series model for the current quarter.
  3. A leading indicator economic model for the previous quarter.
  4. A univariate time series model for the previous quarter.

A single set of weights   is fit for all CBSAs using data for the previous four years.

Leading Indicator Economic Model

The leading indicator economic model follows the approach of Shan and Stehn (2011) and Chen et al (2011) who apply a structural econometric model to forecast housing prices.  This type of model, which dates back to the seminal paper by Capozza, Hendershott and Mack (2004), allows for serial correlation and mean reversion to a long-term price trend, reflecting the illiquid nature of the housing market.

Long-Term Price Model

Over the long term, housing prices are assumed to change according to the household income, cost to the owner, construction cost and availability of land.  This is reflected in the following equation

where m is the CBSA, t is the quarter and

User cost, which is the cost of owning a home, is defined as

where

Short-Term Error Correction Model

Short-term price dynamics are controlled by three main components:  short-term price momentum, price correction toward the long-term fundamental price and technical economic factors.  The forecasting equation is

where

Econometric Model Fitting Details

The fundamental long-term price model was fit using a panel regression model with fixed CBSA effects and pooled effects across all CBSAs for the other variables.  The predicted values from this model were used as input to the error correction model (ECM).  The ECM was fit separately to seven different geographically similar regions with fixed CBSA effects and pooled effects for other variables with the pooling occurring across all CBSAs within each region. The “plm” software package in R was used to fit the models (Croissant and Millo, 2008).

Time Series Models

To complement the leading indicator panel regression model, a univariate time series model is fit for each CBSA.  After comparing a variety of model choices, including ARIMA and structural models, the time series model used was a double exponential smooth with a damped trend.  The ability to dampen the trend was useful to prevent the model from overshooting. The parameters were fit by maximum likelihood using the “forecast” software package in R (Hyndman, 2012).

Sub-CBSA Region Forecasts

Forecasts for regions within a CBSA, such as counties, hinge off the CBSA model. In part, this is because many economic variables are available at the CBSA level but not at other geographic region types (e.g., housing supply indicators are not available at regions smaller than CBSA).  Because the prices in a sub-region of a CBSA are co-integrated with the CBSA, the forecast can be derived from the difference between the sub-region and the CBSA.  The times series of differences is very stable over time and is modeled using a double exponential smooth with a damped trend.

State Forecasts

The state model, like the sub-region model, hinges off the CBSA forecasts.  Because there are multiple CBSAs within a state, an aggregate CBSA price is created by weighting the individual CBSAs according to the number of housing units Ni,t.

The forecast can be derived from a time series model of the difference between the state and the CBSA:

References

Capozza, D., Hendershott, P. and Mack, C. (2004), “An Anatomy of Price Dynamics in Illiquid Markets: Analysis and Evidence from Local Housing Markets,” Real Estate Economics.

Chen, C., Carbacho-Burgos, A., Mehra, S., and Zoller, M. (2011), “The Moody’s Analytics Case-Shiller Home Price Index Forecast Methodology,” Moody’s Analytics Technical Report, http://www.economy.com/csi.

Croissant, Y. and Millo, G. (2008), “Panel Data Econometrics in R: the plm Package,” Journal of Statistical Software, 27(2), http://www.jstatsoft.org/v27/i02.

Hyndman, R. with Razbash, S. and Schmidt, D. (2012), “forecast: Forecasting functions for time series and linear models,” R package version 3.25, http://CRAN.R-project.org/package=forecast

Shan, H. and Stehn, J. (2011), “US House Price Bottom in Sight,” Global Economics Paper No. 209, Goldman Sachs Global Economics, https://360.gs.com.


About the author
Andrew Bruce
Andrew is the Director of Data Sciences at Zillow