Have questions about buying, selling or renting during COVID-19? Learn more

Zillow Tech Hub

Bigger, Faster, and More Engaging while on a Budget

A true story of how Zillow uses performance budgeting.

At Zillow, we’re always implementing new ways to empower consumers with data, inspiration and knowledge around the place they call home, and connect them with local professionals who can help.  New features, more property data, big beautiful photos, and videos are just a few ways we like to delight shoppers as well as agents, sellers, and property managers through added page views, agent contacts and traffic.

Developers everywhere in the industry continue to make their sites more engaging through adding more responsive JavaScript, richer images or videos, custom fonts, and unique styles.  Data trends recorded by HTTPArchive.org prove the evolution of the web has seen page weight and resource requests creep a result of ongoing feature development.  Over just the past 5 years, the average transfer size per site has increased from 700KB to over 2100KB, or a 3x increase!

Site Growth over Time
Image credit to HTTPArchive.org (http://httparchive.org/trends.php )

This additional weight and resource use contribute to the unfortunate effect of degraded performance on the web.  Over time, these effects can be difficult to notice unless someone is routinely monitoring and trending performance data.  This is a common challenge for every company who wants to track performance improvements of their software and be competitive in the industry.

Taking performance seriously, we have a great set of data-driven performance tools and are empowered to prioritize product performance for our users.  We have a custom Real User Monitoring (RUM) solution which we can trend and monitor client performance in Splunk.  Our own in-house Web Page Test (WPT) instance connected to our test infrastructure gives us daily performance trends in development environments.  To help combat this potential performance degradation over time, we’ve added SpeedCurve.com to our tool belt.  SpeedCurve offers the ability to implement performance budgets, be alerted of potential issues, and more effectively manage application performance.

In a recent incident, the value and necessity of creating and scrupulously monitoring performance budgets was made all too clear.  Though the circumstance highlighted below is unique, our unexpected discovery serves as a striking example of how vital it is to monitor performance budgets and manage applications relative to them.  This relatively simple safety net enables designers and developers to build engaging features for users while supporting optimal performance and reaching for higher goals in overall product performance.

Implementing Performance Budgets

SpeedCurve enables users to trend common resources and timing on a web page.  For example, it tracks total image,JavaScript, fonts, HTML, and CSS sizes as well as request count for each.  Similarly, users are able to trend specific timing marks like page load time or other custom W3C User Timing marks.

After a re-launch early in 2015, SpeedCurve now also grants users ability to configure a budget against any of these, which appear as a threshold to be maintained during ongoing development.  Some examples might include a total size budget forJavaScript, the time it takes to load a user control, or the number of font requests we allow on a certain page.  An alert can be configured to notify product owners and developers anytime a budget is exceeded, or if something changes drastically.

I configured budgets against all our main pages; both for resource limits and timing events.  Notifications were set to be triggered anytime a 5% delta was identified or a defined budget exceeded.

Performance Budgets in Practice

During my routine morning commute one day, I reviewed the SpeedCurve.com alert mail and was shocked, almost in disbelief at what I saw. Numerous and substantial budget violations were observed for our mobile version of the real estate search page.

  • Page load time was over budget by 61%. Previously just under 11 seconds, load time was now almost 18 seconds.
    Page Load Time Budget
  • Image requests exceeded by 373% as a result of 112 newly introduced images.
    Image Requests Budget
  • Total page size over by 417% at now 13.2mb. An increase of about 11mb.
    Total Size Budget
  • 708% increase in image size up from normally 1mb to 12.3mb, suggesting the added 11mb of images were likely the culprit.
    Image Size Budget

With the mobile user in mind, this dramatic increase in size and slowdown in page load time would seem unacceptable.  Unnecessary page weight comes at a cost of page load time, piles on usage rates for mobile carrier data plans, and is an infamous drain of mobile battery life.  These violations would culminate in a sub-optimal experience for Zillow’s mobile customers, or so it would seem.  Our investigations were about to get very interesting.

Addressing the Exceeded Performance Budgets

We were quick to engage.  Before I set foot in office that morning, I had forwarded the Speedcurve notification email with my thoughts on potential user experience impact.  The initial response was as expected: the changes were a direct result of new features included in the latest release the day prior.  Now we needed to find out: did the new features justify the diminished performance results?  What did our users think?

Understanding the Feature Change

The new feature release was intended to improve user experience by increasing the size of the property listing images on the mobile search list page.  The image size changed from a smaller thumbnail to a larger photo with details overlaid on the image, as seen below.

New Feature Comparison
Images on the left are before, images on right are after with the full width property image.

From a business perspective, improved user engagement was seen in Business Intelligence (BI) data during A/B testing.  BI confirmed increased traffic, page views in property listings, and agent contacts.

More information continued to surface.  We had executed an A/B test of this feature and adjusted the A/B test treatment to 100% for evaluation.  I was able to review our RUM data during that time and confirm, there indeed was a 3-4 second improvement in ‘time to onload’ performance.

Confirming the Exceeded Budget Alert

The page load time budget in SpeedCurve was set to 11 seconds.  The budget excess alert reported an excess of 7 seconds.  Our RUM data was suggesting degraded performance for our users on this page as well, about 7 and 10 seconds slower at the 75th and 90th percentiles respectively.

Mobile vs Desktop RUM
90th percentile ‘time to onload’ for Mobile version of the page relative to the Desktop version showing degraded performance post release on 5/6/2015.

Time to Onload for Mobile
75th and 90th percentiles of ‘time to onload’ for the Mobile page showing an approximate 10 second degraded page load time with unused images. 

Concern loomed: were we losing mobile traffic due to this degraded performance?

Finding an Explanation

The RUM data and SpeedCurve’s budget excess alert when compared to pre-release norms and A/B test data indicated results of an opposing nature.  These results begged for an explanation.  What occurred in this release to produce SpeedCurve’s data showing a 4x increase in size, and the ‘time to onload’ slow down we saw in RUM?

We reviewed our page more closely and identified the source of SpeedCurve’s budget excess alert.  A unique and previously undetected bug in the control A/B test bucket for an upcoming feature was the culprit.  As a result, the page was pulling down extra images, increasing full page load time, and significantly increasing page weight beyond forecasted impact.

How did users respond to the added page weight?

Our BI data would help quantify the impact to user engagement. Much to our surprise, user engagement was undeterred.  As shown below, there is a small decrease in Page Views (PVs) per Session and PVs per Unique User (UU).  A slight decrease and bounce rate but also a slight bump in Sessions and UUs. Fluctuations with traffic patterns and other factors during this time sample period represent potential influencers, thus we must consider these low numbers insignificant.

Engagement Data w/ the Exceeded Budgets vs w/Out (11days prior vs 11days post)
Device Type Uus Sessions Duration/Session (ms) PVs/Session Bounce Rate PVs/Uus
Phone 2.31% 2.42% 1.86% -2.09% -1.20% -1.99%
Tablet -0.95% -0.96% 0.98% -3.64% 0.52% -3.65%

The bug in the control bucket was real, but it turned out that the additional images downloaded were never actually seen by the user.  It was introduced during implementation of an upcoming feature to enable a new photo-centric feature, which wasn’t ready for users yet.  However, additional images for the new feature were being downloaded in the background, though they didn’t affect the original functionality of the page.  We surmise the user didn’t perceive a noticeable short term degradation in performance.

Similarly, the ‘SpeedIndex’ rating, as shown in SpeedCurve’s trending metrics, is intended to show user perception relative to content being loaded above the page- fold.  Explicit explanation of this rating can be understood here.  As the BI data shows there was no significant impact to user engagement, the SpeedIndex rating as observed before and after the release supports it.  Further confirmation that the added unused images are loaded by the browser but never impact visually the user perception of performance.

Contrast the significant size increase relative to no effect on the speedindex value.

Depending on which mobile browser users were accessing, they may have noticed the active download animation operating longer than normal.  If they were monitoring their data plan consumption and doing extended mobile browsing on the page, they would have observed an increase in mobile data plan consumption and a more rapidly degrading mobile battery life.

What did we do as a result?

Thanks to the alert from the new performance budgets in place, we were able to quickly engage, investigate, and make a hot-fix to eliminate the bug serving unused images.  Coincidently though, user perceived performance was not impacted in this case.  These findings were valuable in learning the potential impact to user perceived performance for the scheduled new feature.  From a design perspective, we can deliver more images to delight our users.  From an engineering perspective, we can do this in a manner not to impact perceived performance.  From a business perspective, we can improve engagement to better empower consumers with data they desire.

Zillow values how performance plays a role in the consumer experience, invests in the latest tools and analysis to make performance a priority, and is enabled to act quickly to fix real user issues.  Performance budgets will continue keeping us in check and lend additional opportunity to pursue more rigorous performance improvement goals benefiting our users.

Lessons Learned

Create performance budgets for your popular and regularly changing pages.  Review performance budget violations early and always.  Compare performance pre- and post-release and update budgets accordingly.

Excess resources during page load may not affect user perception of performance. Even on the magnitude of four times the total image size as seen in this case.  Delivering content asynchronously or after the above-the-fold content is a design solution for intelligently delivering content without impacting engagement or perceived performance.

Reaffirmation: ‘time to onload’ is important but not a good target for user perceived performance.  Fluctuations in page load time as a result of this bug helped investigations along with other exceeded budget notifications.  However, BI data and the ‘SpeedIndex’ rating confirmed there was no noticeable impact to user perceived performance regardless the fluctuations of ‘time to onload’.  As a recommended preference, insert a custom Above-Fold-Time (AFT) W3C User Timing Mark on page to budget actual user perceived performance.  A good supporting reference can be found here.

Respect alternative impacts to user performance:

  1. Be cautious not to deliver excessive content. In this case the 4x increase or almost 11mb of additional images per mobile search page result was a significant and unnecessary increase in user mobile data plan usage.
  2. Be mindful of mobile battery consumption. The wireless radio in mobile devices is the top consumer of battery power next to the device display.  Excess content delivered during page load keeps the wireless radio active longer, unnecessarily consuming energy and decreasing user battery life.

Bigger, Faster, and More Engaging while on a Budget