As part of a recent analysis, Zillow dove into federal weather data to estimate the average number of pleasant days per year for every city in the United States. While conducting the analysis, we discovered that while raw weather data is incredibly rich and has any number of practical uses, the size and complexity of the data might (understandably, but needlessly) intimidate some would-be users.
So we decided to share our R code, with instructions on how to download, aggregate and clean the raw National Oceanic and Atmospheric Administration (NOAA) weather data and then perform the pleasant days calculations.
First things first, it’s important to give credit where it’s due. Our approach was very much inspired by Kelly Norton’s blog post The Pleasant Places to Live. Thank you!
Like Norton, we defined a pleasant day as a day in which:
(When reproducing the analysis, feel free to adjust these parameters to your own climatological tastes.)
The heat map below shows the pleasant days data after merging it with our city-level data. For each city, we averaged the number of pleasant days per year for the closest few weather stations. (See the methodology section below.)
The results confirm conventional wisdom: Southern California has excellent weather – almost twice as many pleasant days as the runners-up. On the flip side, areas of Montana, Idaho and Nevada have the lowest number of pleasant days. Norton’s more sophisticated tool also allows you to explore the number of pleasant days by month, and supports the idea that northern states experience more enjoyable weather in the summer, while southern states have their best days in the fall and spring.
Of the top 25 most populous cities, California is in a league of its own. San Francisco, the California city with the least pleasant weather, has 60 percent more pleasant days per year than Jacksonville, the most pleasant city outside of California. Texas is quite diverse, with cities ranging from 54 pleasant days (El Paso) to 103 (Houston).
Seattle appears seventh on the list despite its legendarily dreary weather, partially because our definition of pleasant doesn’t take cloud coverage into account.[i] But we really do have very little to complain about from the Zillow HQ.
First, we downloaded and cleaned NOAA’s Global Summary of Day data from the past 18 years (1996 to 2013). For each station, we used the definition of “pleasant” as explained above to determine whether each individual date was pleasant.
For each station and calendar day – think January 1 to December 31, not including leap days – we calculated the average number of pleasant days across all available years. For example, if we used the past three years of data (2011 to 2013) instead of the past 18, and if January 1 was pleasant in 2011 but unpleasant in 2012 and 2013 for a certain station, then the average number of pleasant days for January 1 would be 1/3 for that station.
We then summed these calendar-day averages over all 365 days to come up with the annual average for each station. Stations that didn’t have data for all 365 days were discarded. Note that everything up to this point is replicable using the R code linked in the second paragraph above.
To match station-level data with our city data, we averaged the number of pleasant days of either the closest seven weather stations, regardless of distance, or, for places with many weather stations, the twenty nearest stations (provided each is within ten miles of the city center).
Our numbers were generally higher than Norton’s: for the top 25 cities by population, Zillow estimates were above Kelly’s by a median of 32 percent. But when mapped, the pattern is strikingly similar. The discrepancy could be due to a host of factors, but our guess is that it’s methodological differences in the place-to-station matching (Norton used binned zip codes instead of cities as a whole). Feel free to check out his code on GitHub and compare.
[i] Information on cloud coverage isn’t included in the NOAA Global Summary of Day data set.