Next Best Action platform: democratizing personalization with contextual bandits

Ruomeng Xu

Xiaojing Zhu Dec 08 2024

The process of buying, selling or renting homes can be complicated, often leaving buyers, sellers and renters feeling unsure and overwhelmed. As a multifaceted platform, Zillow connects these buyers, sellers and renters with local market experts, and provides the information, technologies and services needed to help people confidently navigate their move. Given our varied offerings and diverse customer base it’s critical that we present a personalized and contextualized view of a customer’s likely next actions across the various touchpoints within the Zillow ecosystem. In addition, we must offer up content that resonates with them most, whether encountered on the Zillow app or website. This approach ensures that Zillow simplifies the complex real estate process by guiding customers in the right direction.

With this mission in mind, we developed the Next Best Action (NBA) platform, which enables product and marketing teams to easily personalize and optimize customer communications, messages and calls to action (CTAs) for their most important and relevant next steps

In this blog post, we will cover the NBA vision, the core AI model used, the infrastructure and its performance in a specific case, along with our plans for the future.

NBA Vision

Given the variety of services, technologies and information available to support customers throughout their moving journey, our product and marketing teams create various entry points and value-proposition messages across the Zillow app and website. For example, the “Reason to Believe” tiles on the homepage help customers navigate Zillow’s main product offerings. And home details pages prompt users to take a tour, contact an agent or talk to a lender for pre-qualification. (See Figure 1 for two example touchpoints.)

Figure 1: Left panel: The module of ‘Reason to Believe’ tiles on the Zillow homepage. Right panel: The module of calls to action in a Zillow home detail page.

Key Challenges

Marketing and product teams often use a manual, iterative approach to curate messages; rely heavily on manually-tuned rules for targeting users with the most effective communications and CTAs; and roll out experiences globally after A/B testing. This approach has demonstrated significant business value during the initial product launch phase, but it becomes more challenging to meet customers’ needs as the product and customer base grow.

Challenge 1: Lack of personalization — Teams rely on A/B tests to find winning messages, often adopting a “winner takes all” approach. This fails to address the full spectrum of customer needs. For example, if a message succeeds with 60% of customers, it means the needs of the other 40% of customers aren’t being prioritized, which can reduce customer retention in the long run.

Challenge 2: Slow speed & increased effort to learn — For teams without personalization resources, optimizing messages via sequential A/B testing requires the use of an iterative, rule-based hypotheses. This approach slows down the learning process and requires additional engineering and data science resources. Even teams with sufficient AI resources face lengthy experimental cycles that require significant time and effort to collect data, build and evaluate models, and conduct A/B tests—often taking weeks or months to complete the process.

Challenge 3: Opportunity cost of sequential testing — As new products and CTAs are introduced and creative customer-targeted messages are designed, stakeholders must evaluate these new variants and calibrate them against what’s currently live on the website, app or other property. Exploring the viability of new products, CTAs or messages can negatively impact overall engagement if they prove to be irrelevant or nonviable.

Challenge 4: Stale rules and models — Manually-tuned rules and static models are based on the data collected for a specific time period, often failing to adapt to shifting customer needs and business priorities.

Our Vision

The NBA platform aims to overcome the challenges of the traditional iterative and sequential approach to optimizing communications. Our goal is to enable product, marketing and business teams so they can easily and efficiently create personalized experiences that target users with relevant messages and dynamically adjust to changing business goals. More specifically, with NBA, we have focused on achieving the following objectives.

Provide personalized experience: Instead of the “winner takes all” solution, NBA endeavors to provide content, modules and calls to action that are most relevant to each individual customer based on their personal journey and situation.
Drastically reduce the lengthy and difficult experimental cycle: After initial integration, the NBA platform can significantly reduce the time needed to optimize relevant user messages from months to weeks, or even days.
Reduce the risk of experimenting with new variants: NBA allows stakeholders to dynamically introduce additional variants, because it will automatically assess their viability and relevance within the experience, and learn its way into a perpetually optimized personalization strategy. This fully-automated process minimizes the risk of irrelevant messages and maximizes the benefits of relevant ones.
Adapt to changing business objectives: Business requirements may not always be clear upfront, and priorities can frequently evolve. NBA is designed to empower teams to iteratively adjust their optimization objectives on the fly to meet changing business needs.

NOTE: For simplicity, in the following sections we will use “action” to generally refer to any messages, CTAs, communications or product offerings.

Core AI Model for NBA

One of the techniques commonly used to achieve such a vision are bandit algorithms that optimize the explore/exploit tradeoff. Contextual bandit algorithms are a special type of algorithm that tailor actions based on specific user contexts. In this section, we introduce the core AI model for NBA, which is the contextual bandit model (CXB), as well as the core AI capabilities enabled by the model.

Contextual Bandit Model (CXB) Definition

We formulate the action personalization as a contextual bandit problem, where at each time step t ∈ {1, 2, …, T} (or trial t), a learner repeatedly:

observes a context x_t describing both user features and action features, and a set of candidate actions A;
chooses an action a_t ∈ A based on the previously inferred reward function f (r | x, a) that estimates the reward from context and action. The action is chosen either as the action with the highest expected reward max_a_∈_A f (r | x_t, a), or as a suboptimal action for exploration to gather more information;
observes a reward r_t for the chosen action a_t;
improves the accuracy in estimating the reward f (r |x, a) using the collected feedback data (x_t, a_t, r_t)

The goal of the bandit algorithm is to find a policy that maximizes the expected sum of reward over time. This policy dictates the action choice at each time and improves under exploration over time.

Figure 2: A diagram for personalizing users’ next actions through the CXB model.

Specifically, the CXB model we build for the NBA platform is composed of three key components, as follows the diagram illustrating the model components is provided in Figure 2).

Base machine learning (ML) model — The base ML model learns the mapping f: context, action → reward, which is used to estimate the reward under any given context x, for each possible action a ∈ A. The choice of the mapping function is very flexible, which can be specified as a linear model, decision trees, neural networks, etc. We currently use the gradient-boosting tree regression model as the base learner in order to model the relationship from the context and the action to the reward.

Exploration-exploitation — Exploitation leverages the user feedback collected so far and chooses an action that maximizes the estimated immediate reward based on the learned base ML model. Exploration involves testing different options to gain insights into the potential reward that each action can offer. Only exploring will miss out on immediate rewards, and only exploiting will miss out on discovering actions with potentially higher rewards. Therefore, it is the balance of the two that matters in finding the optimal solution that is both sustainable and flexible — able to endure and adapt to a dynamic environment. There are a variety of exploration strategies to facilitate learning (e.g., ∈-greedy, upper confidence bound, Thompson sampling). We currently adopt the ∈-greedy strategy, which is set to explore with probability ∈, in order to choose the action uniformly at random across all eligible actions, and to exploit with probability 1- ∈, which chooses the action that maximizes the current reward.

Incremental Learning — The learning component continuously updates the base ML model based on user feedback that has been newly collected within the exploitation-exploration and the user/action context. The result is that the learning component adapts to changes over time. In some cases, these base ML models can be learned incrementally with streaming data (e.g., SGD, mini-batch update), or learned repeatedly with updated batch data (e.g., full refit). We currently adopt the latter approach by applying daily batch model training,using the new data collected within a sliding window. By default, the sliding window updates the model with the most recent few days of data.

Overall, the CXB approach calls for using an online learning algorithm that sequentially selects the next best actions, or explores to serve users based on contextual information about them and their actions.It simultaneously adapts this strategy based on newly collected user feedback and action pools, in order to maximize total reward.

AI Capabilities

The following AI capabilities are unlocked in the CXB model in order to fulfill the NBA vision.

Personalization — The NBA platform personalizes users’ next actions to increase click-through rates compared to traditional marketing’s “winner takes all” strategy. In the current setting, the CXB model operates dynamically, using the gradient-boosting tree regressor and epsilon-greedy exploration in order to deliver personalized actions . This method ensures that the majority of users receive the best action to achieve personalization goals, while maintaining a small portion of users for random exploration aimed at uncovering the potential of other actions.
Cold start — The NBA platform can be cold-started, learning and adapting on its own within a few days based on newly collected reaction data. When faced with a new use case involving brand-new actions that have never been exposed to users, traditional ML models typically require separate processes for data collection and offline model training before providing recommendations. This often involves a lengthy test for random data collection. In contrast, the CXB model begins by initializing the base model regressor as a uniform distribution, or as a predefined prior distribution with respect to input features. This approach exposes all possible actions to users, and as user reaction data flows in, the regressor can be automatically refreshed to reflect user preferences to certain messages. Consequently, the need for upfront random data collection and offline modeling are eliminated, and applying NBA to new use cases takes less time and energy.
Exploration-exploitation and adaptiveness — The NBA platform supports a changing set of actions (e.g., adding new actions) entering the contextual bandit ecosystem, and retains those lessons that it has learned from previous iterations. Exploration facilitates new insights for these actions, and exploitation leverages previous learning. The CXB model adapts to provide personalization for the updated set of actions. Additionally, the model is updated regularly with data about microeconomic changes in the highly dynamic real estate market. The exploration component of the CXB model effectively breaks the feedback loop typically found in recommender systems by recommending actions beyond the model’s normal suggestions, thus facilitating the collection of diverse, high-quality data that will enhance further model updates.
Business “steering wheel” — The NBA platform optimizes overall business objectives, ensuring intelligent trade-offs across different lines of business. The adaptive nature of the CXB model allows client teams to modify the reward of each action on the fly, and the system can swiftly adapt to new solutions after optimization objectives have been updated.

Infrastructure

To support the aforementioned AI capabilities across various use cases, we built the NBA infrastructure, which includes a batch processing and an online serving flow. The batch infrastructure handles:

data processing
model retraining
decision computation
online storage updates with the latest decision data

The online serving component processes requests from client teams and retrieves personalized decisions from online storage. The response information from the NBA orchestration service and the reward information from client systems are both critical to the NBA platform’s learning capabilities. For this reason, the online server features a library that will help render the required information into a standardized format for batch processing.

Use Case: ‘Reason to Believe’ Tiles on Zillow Homepage

In this section, we showcase the performance of the CXB model in a specific use case to demonstrate its effectiveness in application.

Use Case Overview

Each month, the Zillow homepage serves as the starting point for millions of customers’ real estate journeys. The “Reason to Believe” (RTB) tiles on the homepage (see the left panel in Figure 1) help customers navigate Zillow’s main product offerings without needing to use a complicated top navigation menu. Before the integration with NBA, all eligible customers saw the same RTB messages in the same order, regardless of their past behavior or persona. We leveraged the NBA platform to optimize the content and ordering of the RTB tiles out of the four possible candidates that appear on the Zillow homepage: Home Loan Prequalification, Buy with Agent, Rent and Sell. Figure 4 below shows two different user experiences before and after the NBA personalization. In this use case, each RTB corresponds to one action in the CXB model. This model gradually learns the user’s preferences on RTBs through collecting feedback. This process considers users’ context information, including their home-browsing and engagement history, as well as the business value of clicks on different types of RTBs.

Figure 4: A comparison in user experiences without NBA and with NBA.

Experiment Results and Interpretation

Based on this use case we ran multiple experiments in order to validate each of the NBA platform’s AI capabilities. In this subsection, we present the key results from both online A/B testing and offline evaluations.

Figure 5: A comparison on NBA relative lift in clicks to demonstrate the AI capabilities of personalization and cold start.

Personalization. To showcase the personalization capability of the NBA platform, we conducted an online A/B test, comparing the static RTB orderings against the personalized ordering from NBA. The results are visualized as red bars in Figure 5, which shows that compared to the status quo “winner takes all” baseline, NBA successfully increased the total clicks, as well as shifted more clicks to the top positions. This demonstrates that NBA can offer an improved and personalized user experience by placing actions that users are more likely to engage with in the prominent top positions.

Cold start. To validate the cold-start capability of the NBA platform, we initialized the CXB model in two ways: cold start by randomly exposing different RTBs to users, and warm start by recommending actions based on a pre-trained model (using data collected from the randomization test). We then compared the model performance in an online A/B test. As a result, the cold-start bandit achieved comparable performance to the warm-start bandit within five days of exploration(displayed in Figure 5). This suggests the potential elimination of separate random data collection for new use cases, and indicates that experimental cycles can be drastically reduced without compromising the quality of user messages/actions.

Figure 6: A comparison of the total RTB clicks per user between NBA without new action (blue) and NBA with new action (orange), to demonstrate the AI capabilities of adaptiveness through exploration and exploitation. Left panel: The new action added has a low CTR. Right panel: The new action added has a high CTR.

Exploration-exploitation and adaptiveness: To showcase that the NBA can support a changing set of actions, we conducted offline experiments to simulate the process of adding a new action to the contextual bandit ecosystem and compared the performance of the NBA with the new action against the one without the new action. The experiments show that if the new action added has a low click through rate (CTR), the CXB model will capture the same amount of clicks as the one without the new actions (Figure 6, left panel). If the new action added has a high CTR, the bandit model will adapt to capture more clicks than the one without the new actions (Figure 6, right panel). This demonstrates that regardless of which new actions are added, the model is able to automatically explore their viability within the experience and adapt to provide a good personalized ranking among all actions. Adding new actions will not hurt total reward, regardless of which actions are added.

Figure 7: A comparison of NBA relative lift in business values to demonstrate the AI capability of the business “steering wheel”.

Business “steering wheel”. To validate that the NBA platform can adapt to changing business objectives on the fly, we conducted an online A/B test. Both buckets were initially set with equal rewards for each action; then one was changed to unequal rewards reflecting the actual business value of clicking each action. Compared to the equal reward model, the true business-oriented bucket converged to the optimal solution within three days in response to an updated reward of each action, achieving the lift (as depicted in Figure 7). This resulted in an overall increase in the business value of clicks and increased the value derived from the first two positions. This validates the NBA platform’s capability to adapt with the changing business goals, optimize on the fly and drive more high-value clicks.

What’s Next?

In this section, we will outline the exciting future work that will help unlock the next phase of the NBA platform. This includes real-time serving, test-free impact measurement and scalable action serving for multiple use cases.

Real-time NBA with Additional Context

We have already launched the initial version of the NBA system, which pre-computes decisions offline and serves online upon request. However, the system lacks the ability to process the real-time context information, as well as more timely (e.g., session-based instead of daily) model refreshing. We believe that building an NBA system that takes in real-time user features and context, then conducts real-time decision computation and model updates will significantly boost the performance of the platform, both in terms of personalization and business-value optimization.

Understand Treatment Effect Without A/B Tests

One of the key advantages of NBA is its adaptiveness. Because the bandit model learns online and adjusts to its optimal state upon changes (e.g., adding a new action), this allows us to eliminate the need for traditional A/B tests . However, businesses sometimes want to understand the ROI of iterations, similar to what traditional A/B tests provide. In response, we are developing an analysis method that will offer similar insights and impact measurement as traditional A/B tests.

Scalable and Coordinated NBA Model Architecture

As Zillow develops the real estate super app, an integrated digital experience that connects all of the fragmented pieces of the moving process on one transaction platform, a significant challenge arises for NBA: how to serve distinct needs for all use cases across different touchpoints, while ensuring that prompts do not conflict. To this end, we are exploring a taxonomy approach that will effectively organize use cases and lines of business, and serve as a backbone to scale the NBA model across different touchpoints.

Who Made This Happen?

This project is a true team effort, and there are many people to thank. A big thank-you to the MarTech, Marketing, Data Engineering and Growth AI teams for bringing the NBA vision to life. Special thanks to Matt Schuerman and Aditya Sundaram for leading the engineering effort that made this incredible platform a reality for Zillow teams and customers. A special thanks also to our cross-functional product and engineering leaders — Rebecca Brown, Dili Wu, Sam Jorgensen, Ondrej Linda, Dan Glasser, Deepthi Kondapalli and Ayse Kulahci — for navigating this NBA journey. Similarly, thanks to Connie Jimenez and Kane Merrill for their partnership in marketing experimentation.

Homes for sale

Resources

Discover rentals

Your search

Your rental

Resources

Resources

Selling options

Looking for pros?

I'm a pro

Rental Management Tools

Learn More