Have questions about buying, selling or renting during COVID-19? Learn more

Zillow Research

Methodology: Home Sales

Overview

Zillow is releasing a new sales metric providing the number of new and existing home sales across the country. The key elements of this series:

  • Home sales are reported for arms-length transactions of single family, condominium and cooperative properties.
  • Home sales are reported at the national, metro and county levels.
  • The series starts in June 2008. A transaction date is defined as the closing date recorded on the county deed.
  • A seasonally adjusted series will be available, produced using the X-12-ARIMA[1] methodology.
  • All standard real estate transactions are included in this metric, including REO sales and auctions. Substantial effort has been made to remove transactions not typically considered a standard sale. Examples of these include bank takeovers of foreclosed properties, title transfers after a death or divorce and non arms-length transactions. An unfiltered series showing all transactions observed by Zillow is also available.

Methodology

Across the country, home sales are recorded by county governments. However, given the different systems that counties have in place for reporting these sales, there is typically a lag from contract signing to when a sale is recorded, and again to when data users can view transaction data. This lag is highly variable, ranging anywhere from a few days to multiple months. Thus, reporting home sales on a short time horizon is an exercise in nowcasting, that is, adjusting the observed data for latent, unobserved data.

Given that latency is primarily a function of county reporting, it is natural to perform the adjustment at the county level. Each county adjustment is done with one of two dynamically chosen methods, depending on the nature of the counties’ latency. These methods are described at the end of this brief. Metro sales are simply the sum of sales in their member counties.

To compute a national sales number, another layer of latency adjustment is performed. An initial national number is computed as the sum of all county sales. However, there are a number of counties that are so latent in their data reporting that no attempt is made to produce a series for that county. As a result, this national aggregation is light. We adjust for this by running the weighted regression:

Eq1

Here, d represents a given prediction depth. For example, June 2014 sales reported in September 2014 would be a depth of 2. Observations further back in time are less likely to need revision and are weighted more heavily; more recent observations in the training set are down-weighted.

It is important to recognize that this metric is a forecast. As such, it is subject to revisions each month with the addition of new data. The parameters tuned for the seasonal adjustment using the X-12-ARIMA methodology will be re-evaluated annually. Also, as with all economic metrics produced by Zillow, Sales go through a battery of unit tests, looking for abnormalities or volatility in the data. Back testing is also implemented; regional adjustments that have not performed well historically under this methodology are not reported.

County Latency Adjustment Details

Method A:

Some counties demonstrate consistency in their latency, allowing us to predict what proportion of sales are observed on a given date from observed transactions and historical latency. Historical latency is the percent of all sales transacting in a given month still unobserved by a future date. For example, suppose on July 15 we are adjusting the number of sales in June for latent transactions. Looking backward in time to periods where we can assume we have finally seen the full set of transactions, we can compute the monthly average share of all sales reported within 15 days of a month’s end. The number of transactions in June is then: Eq2
A similar computation is done for previous months. To report May numbers on July 15 for example, we estimate the Historical 45-day Latency.

Method B:

While some counties are quite consistent in their reporting of sales, empirical evidence shows that many counties are not consistent month to month in the latency of their reporting. It is not hard to construct reasonable scenarios as to why this might be, including lower throughput during the holidays and the scale of county offices. With this in mind, a linear model is constructed that explains transactions Tm in a given month m as a function of quarterly indicator variables and:
Eq3 – The number of sales recorded in months m through the current month, transacting in month m; a measure of observed transactions.
 
Eq4 – The proportion of sales observed in month m through the current month that transact in month m out of all sales observed over this same time span; a measure of timeliness.
 

When a larger share of sales reported in a given month actually occurred in that month, their backlog of unreported sales is likely smaller and so we adjust prior months by a smaller factor.

 

[1] https://www.census.gov/srd/www/x12a/

Methodology: Home Sales