Zillow Rent Index: Methodology

Posted by: Yeng Bun    Tags:      Posted date:  March 12, 2012  



Introduction

Similar to the Zillow Home Value Index (ZHVI), we created the Zillow Rent Index (ZRI) to track the monthly median rent in particular geographical regions. Like the ZHVI, we sought to create an index for rents that is unaffected by the mix of homes for rent at any particular time. This makes temporal comparisons of rents more valid since the index is tracking the rents for a consistent stock of inventory. It also makes it easier to compare the ZHVI and ZRI since they are based on a similar set of homes whereas traditional metrics tracking rent and sale prices are often based on markedly different sets of homes (often located in different neighborhoods), thus making comparisons less valid.

Underlying Data

Similar to the Zestimate, we estimate rents (Rent Zestimates) based on proprietary statistical and machine learning models. Within each county or state, the models observe recent rental listings and learn the relative contribution of various home attributes in predicting prevailing rents. These home attributes include physical facts about the home, prior sale transactions, tax assessment information and geographic location as well as the estimated market value of the home (Zestimate). Based on the patterns learned, these models estimate rental prices on all homes, including those not presently for rent. The purpose of the Rent Zestimate is to give consumers an indication of the fair market rent for a home, while the purpose of the ZRI is to give consumers insight into rental price trends in a way that is not biased by the mix of homes currently for rent.

Because of the availability of Zillow rental listing data used to train the models, Rent Zestimates are only available back to November 2010 and, consequently, each ZRI time series begins on this date as well. We generate the ZRI at seven geographic levels including neighborhood, ZIP code, city, congressional district, county, metropolitan area, state and the nation.

Market Segments

Within each region, we calculate the ZRI for various subsets of homes (or market segments) so as to afford greater insight into what is happening in a particular market. All market segments are shown in the table below. Apartments are treated as condominiums. For more details about market segments, please see the Zillow Home Value Index methodology.

Table 1: Market Segments for Zillow Rent Index

Market Segment Number of Rent Zestimates Description
All Homes 84.9 M Single family + condominium + cooperative
Single Family 75.5 M Single family only
Condo 09.3 M Condominium + cooperative only
0 or missing 42.9 M 0 Bedroom
1 Bedroom 01.7 M 1 Bedroom
2 Bedroom 11.4 M 2 Bedroom
3 Bedroom 29.4 M 3 Bedroom
4 Bedroom 12.2 M 4 Bedroom
5+Bedroom 03.3 M 5 Bedroom or more
Top Tier 26.2 M Top price tier among homes within the same metropolitan
Middle Tier 26.2 M Middle price tier among homes within the same metropolitan
Bottom Tier 26.2 M Bottom price tier among homes within the same metropolitan

 

Methodology

Using the estimated rent of every home as represented in the Rent Zestimate, the main steps in the construction of the ZRI are as follows:

  1. Calculate Raw Median Rent Zestimates
  2. Apply Simple 3-Month Moving Average
  3. Final Quality Control

Calculate Raw Median Rent Zestimates

Let t be a discrete independent time variable with a value at the end of each month. Let H(t) be an M by N matrix with each element hij(t) representing the number of homes at time t for the i-th market segment in the j-th geographical region, where M is the total number of market segments and N is the total number of unique regions having a minimum required number of Rent Zestimates. Currently, we have M=12 and N=57,022. Geographical regions include national, state, metro, county, city, ZIP code, neighborhood and congressional district. The Number of Rent Zestimates column in Table 1 above represents the number of homes in the i-th element of hij when j=’National’ and t=’Jan-2012’.

Let zij(t) be the vector of Rent Zestimates of all homes at time t having length hij(t) for i-th market segment and j-th region. The raw median Rent Zestimate, rij(t), for i-th market segment and j-th region is defined as:

rij(t)=Median(zij(t))

rij(t) is the median Rent Zestimate and is an element of the M by N matrix R(t). In order to ensure reliability and stability, we only compute rij when hij(t) is above some minimum threshold. For Jan 2012, there are a total of 391,375 unique set of regions and market segments for which the median could be computed:

Count{rij(t) ≠NA, for i=1,..M and j=1,..N} is 391,375.

Table 2 shows the counts of Rent Zestimates by region level and market segment. For example, we have usable data to calculate raw medians in up to 2,485 counties for the single-family home market segment.

Table 2: Number of regions by market segment having raw median Zestimates

Market Segment National State MSA County Congressional District City Neighborhood Zip
All Homes 1 51 848 2,486 433 21,229 8,475 22,672
Single Family 1 51 848 2,485 433 21,154 7,810 22,482
Condo 1 51 460 806 410 4,032 3,035 6,467
0 or missing 1 51 825 2,301 433 17,078 4,772 18,894
1 Bedroom 1 51 488 956 416 2,305 1,115 3,757
2 Bedroom 1 51 713 1,680 432 9,331 3,899 12,040
3 Bedroom 1 51 784 1,964 433 13,130 5,497 15,594
4 Bedroom 1 51 739 1,689 432 8,819 3,443 11,676
5+Bedroom 1 51 586 1,160 430 4,123 1,672 6,673
Top Tier 1 49 838 1,523 429 11,892 3,984 14,070
Middle Tier 1 49 838 1,549 429 13,469 4,773 15,712
Bottom Tier 1 49 839 1,511 428 12,069 5,186 14,374
Total 12 606 8,806 20,110 5,138 138,631 53,661 164,411

 

Apply Simple Three-Month Moving Average

We apply a simple three-month moving average to R(t) to filter out noise in the data:

ZRI(t)={ R(t)+R(t-1)+ R(t-2)}/3

The resultant M by N matrix ZRI (t) is a smooth estimate of the median home value free of residual systematic error. This may not be as necessary for large regions such as the nation and states because of the large available data set, but it is applied to all levels for consistency.

Final Quality Control

The time series matrix ZRI(t) has the same dimension as H(t) which is M by N (as noted, 12 x 57,022). While this theoretically could produce more than 680,000 different time series, in practice many time series are eliminated because of data sparseness or temporal volatility. The general logic determining whether a ZRI time series for a particular combination of region and market segment will be suppressed from the publicly available data set includes the following elements:

  1. Number of Rent Zestimates < [threshold]
  2. Number of rental listings in most recent three months < [threshold]
  3. Temporal volatility measured by annualized, monthly or quarterly change > [threshold]
  4. Region has been deemed suspect based on a manual review

Applying the suppression criteria above, there are 195,258 unique deliverable ZRI time series for the report period ending Jan 2012. Table 3 below shows the count of regional time series by region level and market segment. For example, there are 515 time series at the county level for the single-family home variant of the ZRI.

Table 3: Number of deliverable ZRI time series by region level and market segment

Market Segment National State MSA County Congressional District City Neighborhood Zip
All Homes 1 43 277 515 352 6,789 4,695 8,916
Single Family 1 43 277 515 352 6,774 4,447 8,838
Condo 1 43 247 424 342 2,970 2,227 4,988
0 or missing 1 43 276 510 352 5,486 2,814 7,595
1 Bedroom 1 43 242 417 346 1,685 970 2,950
3 Bedroom 1 43 275 508 352 5,146 3,055 7,420
2 Bedroom 1 43 277 514 352 6,151 3,921 8,362
4 Bedroom 1 43 276 512 352 5,244 2,723 7,559
5+Bedroom 1 43 268 490 352 2,914 1,377 5,077
Top Tier 1 42 274 497 352 5,510 2,458 7,527
Middle Tier 1 42 274 498 352 6,205 3,339 8,390
Bottom Tier 1 42 274 493 351 5,579 3,512 7,783
Total 12 513 3,237 5,893 4,207 60,453 35,538 85,405

 

Restatement

Unlike the ZHVI, there is no restatement of the ZRI in the routine monthly calculations because Rent Zestimates do not depend on data that arrive with some latency such as public record transactional data (such as is the case with Zestimates and the corresponding ZHVI). However, there are two situations in which restatements are unavoidable. First, when the boundaries of a geographic region change, the ZRI for the region will change as well since the set of homes underlying the ZRI is different. Second when we regenerate historical Rent Zestimates (for example, when a more accurate algorithm is developed), we also have to re-generate all historical ZRIs.

We are continuously working on improving the underlying algorithm to make Rent Zestimates more accurate. When major improvements to the algorithm are made, we will re-compute the historical Rent Zestimates for affected homes. Our purpose in doing so is to provide consumers with the best estimate of historical rents.

Data Coverage

We calculate the ZRI at the national level as the median Rent Zestimate of 84.9 million homes. The interactive map below displays the number of Rent Zestimates by county for the period ending Jan 31, 2012.


Some county-level ZRIs are suppressed based on the filter rules discussed in the Final Quality Control section above (although Rent Zestimates in those counties are used in computing higher-level ZRIs). The interactive map below shows counties that have a valid ZRI as of January 2012 (green) and those counties where the ZRI has been suppressed based on filter rules but individual Rent Zestimates are still available (red).



About the author
Yeng Bun
Yeng is a Senior Data Scientist at Zillow. For more info on Yeng, click here