ZTRAX FAQs
Zillow Transaction and Assessor Dataset (ZTRAX): Frequently Asked Questions
Zillow Transaction and Assessor Dataset (ZTRAX): Frequently Asked Questions
These tips are intended to help researchers understand the ZTRAX database. You can learn more about ZTRAX and request access here.
How often is ZTRAX updated?
While we strive to update ZTRAX at least quarterly, we do not have the resources to commit to a fixed cadence at this point.
How is the data structured?
The zipped folders stored on Exavault correspond to states and are named according to state FIPS codes. For information on the contents of each folder, see the data schema and data dictionary.
When does the ZTRAX data coverage begin?
ZTRAX’s temporal coverage varies greatly by county. Some counties have digitized records extending back to the early 1990s, whereas other counties have only provided data to the mid-2000s. In Zillow’s work analyzing these data sets to create home value indices, we find sample sizes are sufficient for most areas going back to April 1996.
Data coverage seems to vary widely over time and over space? What causes these discrete changes?
Zillow sources ZTRAX from a major large third-party provider and through an internal initiative we call County Direct. The data coverage gaps we inherit through our third-party source are due to county recording procedures, as well as the data collection process of that third party. Changes to the collection process have occurred over the previous decade due to mergers and other organizational changes, as well as process improvements. Because of the gaps in coverage, Zillow instituted its County Direct program. This program prioritizes counties on a dimension of characteristics and supplements the third-party coverage by collecting data directly from county Assessor and Recorder’s offices. ZTRAX quality will gradually improve over time.
Why doesn’t ZTRAX contain rental data?
Rent, unlike a home purchase, is not a public record. Zillow’s rental data, such as rent list prices, are provided by individual property owners and managers. These listings in turn power Rental Zestimates. Who ‘owns’ listing data (and their derivatives) is nuanced and determined by the individual contracts. While we would love to be able to share what is fast becoming the largest rental database in the U.S., we are legally unable to do so under the relatively light requirements of the ZTRAX data agreement.
ZTRAX does contain sales of commercial properties, including multifamily/apartment buildings.
How accurate are the latitude and longitudes coordinates in ZTRAX?
As far as the geocodes, these are enhanced Tiger coordinates. They are different from the roof top geocodes that are parcel centroids that we use on the site. We do not have redistribution rights to the parcel centroids used on the site. The coordinates provided are interpolated. They are placed on a block segment by address. For example, in 100-199 block range, 150 would be in the middle. These are more useful for measuring distance between properties, not as useful in rendering on a map. Most users re-geocode the properties using the property addresses and a third party mapping service for display.
There are different entries for square footage for the same building and BuildingAreaStndCode. What’s happening here?
Buildings/properties are uniquely identified by RowID (uniquely identifies assessor parcel) and BuildingOrImprovementNumber (identifies different buildings on the same assessor parcel). BuildingAreaStndCode is used to determine the specific space within the building the square footage corresponds to. When the original source provides individual square-footage measures for different areas of the property, they sometimes generalize these spaces as “living areas”, and don’t always specify if these “living areas” are on separate floors or anything specific. To maintain granularity, we will capture separate measures as provided, sequence them and encode them as “BAL – Building Area Living”. If your analysis is constrained within a county, or across only a small subset of counties, you are encouraged to explore the behavior of these variables and encodings for each county before creating a rule, potentially by county, for determining the square footage of the total living area of a property. If your analysis spans many counties, some assumptions must be made. A common assumption is to sum by RowID and BuildingOrImprovementNumber all BuildingArea entries where BuildingAreaStndCode = “BAL”.
RowID is the unique id. Is it also a permanent id? If there is an update does that ID remain the same across versions? What happens when a home is torn down and the lot split into two parcels? What happens to the old id when the two new ids are created?
The RowId uniquely identifies assessor parcels and can be used to match the most current assessor data on parcels (stored in ZAsmt) to previous assessor parcel records (stored in ZAsmtHist) as well as assessor records in future versions of ZTRAX. When a lot is split into two parcels, two new RowIDs are created. The old parcel records will remain in ZAsmtHist under the original RowID, but will likely be missing from the current version of ZAsmt as technically that original parcel no longer exists. A preferred alternative to RowID, which we inherit from our data provider, is ImportParceptID. See below.
Are the ZAssessment tables generated from the ZTransaction tables? Why are sales prices provided in both databases?
Generally, you can think of the data in ZAssessment tables as data sourced ultimately from county’s assessor’s offices and ZTransaction tables as data ultimately sourced from legal recordings processed by each county recorder’s offices. These are usually two separate agencies in the county administration. The Assessor’s office tracks many things, like property attributes, completely independently from the County Recorder’s office. However, when the County Assessor reports sale prices on homes (the SalesPriceAmount variable in the ZAssessment tables), this is data that the county assessor’s office has taken from the recorder’s office and blended into their data set before they sent it to us. Some counties will do this to use the most recent sales prices in their assessment amount models. That being said, we’ve found that the transaction data we get through assessors tends to be marginal and not always up to date, so when available, use the transaction data reported in the ZTransaction tables.
What is the primary reason an address is not included on Zillow data?
The inclusion or exclusion of an address often depends on the county reporting procedures. The county’s Assessor’s Office always supplies an assessor parcel number (APN) and very often an address as well for each parcel. The transactions data comes from a different agency. Legal recordings (including transactions) are processed by each county’s recorder’s office. It’s not uncommon for county recorders to not record the address (or sometimes even APN) on the legal recordings. Many times they’ll use the full legal description of the parcel(s) involved to describe the parcel(s) on the document. In these cases, it’s not possible to do systematic mapping of the legal recording to the specific parcel(s) involved. The presence of an APN or address on a legal recording seems to be very dependent on the county and state it’s recorded in.
Why do some transactions in the ZTrans tables not have ImportParcelIDs?
The ImportParcelID is a field we compute at Zillow based off of assessor parcel numbers (APNs) to try and link transactions to assessments as well as assessment editions year-over-year. In the case where a legal recording does not have an APN or address (see answer to previous question), it’s expected that it would also not have an ImportParcelID.
What variable best represents the true date of sale? (Recording Date vs Document Date vs something else)
The date of the document is generally provided on all documents, typically found in the opening paragraph. If a date is also reported at the signature line, it is captured in the Signature Date field. NOTE: If only a month and year provided, then “01” is entered for the day. The Document Date will be the same as or earlier than the Recording Date.
Generally, we recommend using the DocumentDate. If missing, then Signature Date, if missing, then recording date.
Do you have a standard way of deciding which transactions happen at market price? Are there particular deed types (as measured in DocumentTypeStndCode) that best represent the sort of arms-length transactions that are most likely to generate a true market price?
The transaction database contains many fields and flags describing the nature of the event, which are useful for cleaning or analysis purposes. For example, obtaining a clean set of arm’s length consumer-to-consumer home sales will require extensive filtering on DataClassStndCode, DocumentTypeStndCode, IntraFamilyTransferFlag, LoanTypeStndCode, and PropertyUseStndCode, potentially among others. Extensive exploring on your part is required due to the detailed, rich, and nuanced nature of the dataset. For example, refinance records (LoanTypeStndCode = ‘RE’) are coded as a ‘Deed with concurrent mortgage’ (DataClassStndCode = H) and an intra-family transfer (DocumentTypeStndCode = ‘INTR’). Please note that not all fields are well populated in all areas. Availability often depends on county record keeping processes and conventions.
Our cleaning procedure is extensive, and includes rules levied on the outcome of text matching between buyer and seller names to identify intra-family transfers in the absence of the above flags. We also maintain a proprietary list of non-consumer-to-consumer buyers, such as large institutional buyers. It is a future goal to provide a pre-cleaned and reduced version of ZTRAX for faster analysis. At this time, however, we view the richness and its accompanying dirtiness as a positive feature of ZTRAX.