Have questions about buying, selling or renting during COVID-19? Learn more

Zillow Research

Comparing Now-cast Performance: Zillow versus Google Data

This post is part three of a three-part series on now-casting the MBA Weekly Mortgage Applications Index. To read more on this subject, see the summary, part one and part two.

There is a growing body of research assessing the predictive power of internet searches to estimate the current state of housing and mortgage markets, typically relying on word-based queries for specific terms tracked by the Google Trends search index.[1] Google is, by far, the U.S. market leader for internet searches—accounting for just over two-thirds of all searches according to data available at the time of this writing.[2] There are many advantages to using Google searches as an input to economic forecasts. However, one limitation is the reliance on word-based searches, which may fail to capture evolving usage or regional vernacular discrepancies.

More targeted search data might improve the reliability of macroeconomic forecasts that use online queries as an explanatory variable. In this analysis, we identify one such data source—loan requests on the Zillow Mortgage Marketplace (ZMM), a free online marketplace that connects thousands of mortgage borrowers and lenders each day—and test its value as an input to models forecasting the Mortgage Bankers Association’s (MBA) Weekly Mortgage Applications Index. We compare forecasts of the MBA Mortgage Applications Index using searches on Zillow with similar forecasts using Google searches. Related analysis compares different models for forecasting the MBA index (part one), and using shorter intra-week data on Zillow searches to now-cast the contemporaneous weekly index (part two) that is reported by the MBA the following week.

As a baseline reference, we compare our results from the best-performing model identified in part one to a forecast of the MBA Refinance Applications Index published by Rebecca Hellerstein and Menno Middeldorp on the Federal Reserve Bank of New York’s Liberty Street Economics blog.[3] There are several important differences between the Hellerstein and Middeldorp model and the models discussed in our previous posts:

  • Target series. Hellerstein and Middeldorp now-cast the seasonally adjusted (SA) MBA Weekly Refinance Applications Index whereas our previous analysis now-casts the non-seasonally adjusted (NSA) series. Although the SA series is more relevant for market observers, seasonality in the Zillow search data, along with a relatively short time series precluding meaningful seasonal adjustment at this time, suggest that the NSA series is the appropriate estimation target when now-casting using the Zillow data. To facilitate comparability, however, we use the SA series for all the models.

  • Specification. Hellerstein and Middeldorp use a model based on Google searches for the term “mortgage refinance” within the “home financing” subcategory for the United States, a lag of the mortgage applications index, the ten-year Treasury yield and its one period lag to now-cast the MBA Refinance Applications Index. The MBA index is in percent changes and the remaining explanatory variables are in level differences. By contrast, the previously identified Zillow model uses a first order lag of the index, the contemporaneous percent change in ZMM searches, and leading, contemporaneous and lagging banking holiday dummies.

  • Time span. The Hellerstein-Middeldorp model was originally estimated for the period covering January 2004 through December 2010, and the Zillow model was estimated using data for June 2011 through March 2014, a substantially shorter period.

Finally, it is important to note that the now-cast of the MBA Mortgage Refinance Applications Index was the worst performing of the MBA Mortgage Applications indexes tested using Zillow data, likely due to the exceedingly small share of ZMM business activity that mortgage refinancing currently represents. However, the ability to compare results to previous research provides a useful external validation. In addition, at the time of this writing, the Google Trends index is only available through the end of January 2014. As a result, we drop the last two months of the data when estimating the Zillow model.

To explore the accuracy impact of these time span and modelling differences, we compare five models:

  • The Hellerstein-Middeldorp model re-estimated for the time period spanning January 2004 through December 2010;

  • The Hellerstein-Middeldorp model estimated for the time period spanning June 2011 through January 2014;

  • The Hellerstein-Middeldorp model estimated for the time period spanning June 2011 through January 2014 and replacing the Google index with Zillow Mortgage Marketplace refinance loan requests as an explanatory variable;

  • The previously identified Zillow model estimated for the time period spanning June 2011 through January 2014; and

  • The previously identified Zillow model estimated for the time period spanning June 2011 through January 2014 and replacing Zillow Mortgage Marketplace refinance loan requests with the Google Trends index for “mortgage refinance” as an explanatory variable.

The table below shows the results of these regressions. We are able to replicate Hellerstein and Middeldorp’s results almost identically. When compared by adjusted R-squared, mean absolute percentage error (MAPE), and root mean squared error, the models including the Zillow data outperform the models including the Google Trends index. In particular, when replacing the Google Trends index with the percent change in ZMM loan requests in the basic Hellerstein-Middeldorp model for the period covering June 2011 through March 2014, the in-sample goodness-of-fit metric, adjusted R-squared, increases from 36 percent to 48 percent, a substantial improvement for the percent change series.

 

MBAForecast-Fig9

 

[1] For a review, see Liran Einav and Jonathan D. Levin, “The Data Revolution and Economic Analysis,” National Bureau of Economic Research (NBER) Working Paper 19035, May 2013.

[2] comScore, “comScore Releases February 2014 U.S. Search Engine Rankings,” Reston, VA, March 18, 2014.

[3] Rebecca Hellerstein and Menno Middeldorp, “Forecasting with Internet Search Data,” Liberty Street Economics, Federal Reserve Bank of New York, January 4, 2012. We thank Rebecca Hellerstein and Menno Middeldorp for providing insight into their original analysis.

Comparing Now-cast Performance: Zillow versus Google Data