Improving the home selling & buying experience by containerizing ML deployments

With Zillow Offers we’re transforming how real estate is bought and sold. Underpinning it is a process we follow that ensures every seller and buyer receives a delightful and consistent experience. Unsurprisingly this process is ripe for automation and our approach on the Zillow Offers Machine Learning team has been to develop a containerized platform for rapidly developing, validating and deploying models in response to evolving business requirements.
After deploying our first two models using Zillow’s existing AWS EC2-backed serving architecture, we quickly realized the difficulties our serving tier was going to face. A given model might be called only once per offer request, while a different model might be called thousands of times per minute. Balancing cost, availability and burst scalability quickly led us to investigate an alternative approach.
Our initial software architecture utilized a monolithic design, which allowed us to iterate quickly without needing to provision new hardware. However, as new models were added to it, dependency management became a complex issue. Deploying each model separately, encapsulating dependencies, and a requirement to ensure reproducibility of our predictions drove the need for us to adopt containers.
Once we identified these challenges we wanted to approach the solution in a way that wouldn’t require costly re-tooling or re-training in the near future. A key step here was agreeing on nomenclature to describe the key concepts and components. We went with:
Each project can have multiple variants, with each variant supporting multiple endpoint interfaces for real-time scoring and debugging.
With the concepts described above we needed to turn them into a repeatable pattern. We turned to cookiecutter, a utility for building templatized projects, to provide us with a design pattern for uniformly spawning new repositories. The benefits of this approach were the ease of generating new projects, and the ability to customize them. However, rolling out updates to the pattern itself can be tricky and requires a great deal of attention. We simplified our process by devising a set of internal libraries and tooling shared across projects, centralizing code that performs repetitive heavy lifting and abstracting away much of the plumbing between components. This allowed model developers to focus on tasks they’re best at!
Two of these tools are: Pillar and Scorcerer. Pillar encapsulates utilities to construct training workflows, handles our code variant structure and ingests hyperparameter and training data configurations for jobs executed in AWS SageMaker. Scorcerer centralizes tooling for exposing endpoints for our consuming clients. Scorcerer is infamous internally for mispronunciation but it provides a magical experience and with new services onboarding is here to stay!
When built, each project packages up all its dependencies into a single Docker image and exposes multiple entrypoints which are called during steps in the training pipeline. By reusing the same image consistently between training and serving we flatten the dependency set required (both at a system and an application level), which allows us to take a model we’ve trained within SageMaker, serialize it to our Data Lake, and deploy it on Kubernetes for real-time serving. Without Docker, debugging models trained in the past was fraught with errors as often dependency sets were incompatible. Now as we train we version the serialized model and Docker image concurrently to streamline this process.
Entrypoints for a variant Docker image that’s used by pipeline steps.
The entrypoints we expose by default are:
A new project is generated once a new research area is discovered. The source code template is generated and customized using our cookiecutter-backed archetype pattern. Initial research is performed in notebooks stored within the project, and any reusable code is pulled into separate modules. We use Gitlab’s built-in CI/CD tooling to construct the pipelines that build, test, and publish our Docker images to repositories on AWS ECR. This approach has drastically simplified our route from initial research to production by maximizing code reuse between these environments.
Airflow operates as our training pipeline orchestrator by initiating one or more Spark jobs that take raw datasets from our Data Lake and transforming them into datasets tailored for training. We then feed these datasets into SageMaker as separate channels. At this point we typically complete any feature engineering required for the specific model variant.
Pillar then orchestrates training flows within the SageMaker train entrypoint, and evaluates models as they finish training. The evaluation provides us with a publishing decision for tasks downstream to interpret. We then serialize the model and save it to S3 using a pathing structure that encodes the project, variant name, and other salient metadata.
If the decision is made to publish, Airflow retrieves the variant S3 path generated in the previous step and passes it to Gitlab where another deployment pipeline utilizes Helm to perform a deployment to Kubernetes using our real-time scoring solution Scorcerer. If we fail to publish then we trigger an alert for manual review.
We log metrics for scoring requests made to Scorcerer in Datadog for debugging, and in Zillow’s Data Streaming Platform for longer term offline prediction modeling. This allows us to reconcile scoring requests against ground-truth datasets to ensure our performance correlates with actual values after the fact.
We build artifacts using Gitlab that are consumed for daily training jobs orchestrated by Airflow.
As Zillow Offers continues to roll out to additional markets we’ve seen an increase in the complexity of requirements we’re being asked to support. At the same time we’re also increasingly asked to ensure any models we produce are interpretable for our colleagues internally, as well as for our customers. The next generation of our platform will take this into consideration and is likely to focus on pipelining technologies both at training and inference time, if this sounds like something you’d be interested in helping us out with we’re hiring.
Many thanks to Steven Hoelscher, Alex Pryiomka, Ezra Schiff, Sruthi Nagalla & Taylor McKay for guidance and editorial help on this article and to the entire ZO Machine Learning team for their efforts getting us this far!