Improving Recommendation Quality by Tapping into Listing Text

During the summer of 2020, I had the opportunity to take a remote machine learning internship with Zillow’s Relevance Team while in the final year of my master’s degree in computer science.

The Relevance Team at Zillow extracts numerical (price, bedroom count, etc.) and categorical (property type, etc.) features from property listings to build recommendations on similar homes, personalized search rankings, and other personalization systems for customers.

When I started, our systems did not incorporate the text of listing descriptions as an input to these recommendation systems. So during the summer I built a highly scalable system for generating embedding vectors from listing descriptions that could be used in an array of downstream applications.

I also integrated my embeddings into an existing recommendation system at Zillow for similar homes and evaluated their impact on information retrieval metrics. One of my favorite highlights from spending the summer working at Zillow was seeing the positive lift the embeddings generated in those offline metrics. Seeing that impact made me feel more confident that this work could potentially boost several other machine learning systems within the company and provide Zillow users a better navigation experience.

The flowchart below summarizes what we accomplished with this project:

I started with exploratory data analysis (EDA) of the textual descriptions and considered potential machine learning models for converting them into embedding vectors, ultimately settling on a Word2Vec-based model that I trained on roughly 10 million listing descriptions. I observed high semantic similarity between nearby description embeddings in T-SNE and other visualization tools.

The screenshot above shows a cluster of listing description embeddings generated from my Word2Vec model that are marked with a key phrase from the text they represent. In this picture, we see a cluster of similar vectors that all share references to golf courses. This similarity was what I was hoping to see prior to integrating the embeddings, as we wanted to incorporate new features that could not already be found in more structured attributes of a property like its price or bedroom count.

Following model training, I started building out a pipeline that could use the model (along with any other embeddings model) to generate millions of listing description vectors at scale. I also extended an Airflow DAG maintained by the relevance team to use my pipeline for generating embeddings on a daily basis.

This embeddings generation pipeline is now able to create 2+ million listing description embeddings from several types of machine learning models, with each model generating all embeddings in under 10 minutes.

The system supports the following trained models for embeddings generation where the desired model and parameters can be easily specified in a configuration file:

● Word2Vec

● Word2Vec-TF-IDF

● GLoVe

● BERT

● Doc2Vec

The pipeline is also highly abstracted so that it can be extended by other engineers to support inference and training for other models.

After completing this generation pipeline, I integrated the embeddings into a recommendation system for similar homes and found that several of my embedding techniques yielded significant increases in key information retrieval metrics including nDCG (normalized discounted cumulative gain), MAP (mean average precision), MAR (mean average recall), and MAF1 (mean average F1 score) for non-vacant properties, and even larger improvements for vacant properties in similar home recommendations. As vacant listings typically have less information in structured form, the larger improvement on these types of properties is likely due to the listing description embeddings capturing signal from unstructured data that makes it easier to compare these properties.

The following two charts are a sample of the gains observed with Word2Vec-trained embeddings on non-vacant properties in offline analysis:

This increase in offline metrics suggests that my embeddings will likely boost the quality of similar home recommendations by the Relevance Team’s existing model when run in production, and the team plans to run a live A/B test using the embeddings to validate online performance.

While my analysis focused on the similar homes use case, the team also plans to experiment with using these embeddings to improve other models such as global relevance sort and personalized search ranking, which are models used to rank recommendations so that the most relevant properties appear higher in the list presented to users.

In general, these results validated the team’s hypothesis that using deep-learning-based text embeddings in our models would improve key relevance metrics. And for me personally, building rich embedding features at scale was an incredibly valuable opportunity for me to apply natural language processing theory from my classes in graduate school to interesting applications in industry.

I had a wonderful experience at Zillow last summer and my internship only solidified my decision to pursue machine learning as a career. As a result, I’m excited to re-join Zillow as a full-time Applied Scientist in 2021 and continue working on similar projects on the AI Personalization team!

The internship experience at Zillow

Overall, I really enjoyed how balanced my internship was in both data science and engineering. I not only had an opportunity to iterate through different modeling ideas, but also a chance to put them into production and figure out ways to build them at scale.

I wanted to thank Eric Nichols (my internship manager), Shruti Kamath, Saeid Balaneshin, Sangdi Lin, and several other members of the Relevance team for their support. I’ve learned an incredible amount of natural language processing and machine learning from these folks, and also great engineering practices for building models at scale. I would be thrilled to work with any of them again in the future and can’t wait to see how the team uses my embeddings after my internship.

My biggest recommendation to future interns would be to clarify ambiguities early. If you are confused at all about how to tackle something, please do not be afraid to ask others on your team for help. Everyone I’ve worked with this summer has been incredibly helpful and really embodies one of the company’s core values: #ZGIsATeamSport.

Lastly, I wanted to thank Zillow for letting interns participate in HackWeek! I had an awesome opportunity to work on a sustainable energy project that brings more attention to how “green” Zillow properties are by estimating their carbon footprints. Big shout-out to the other Zillow employees that I worked with on this project — Gloria Deng, Aditya Sundaram, Bryce Barton, Nicole Bachaud, Tony Wang and Shyam Pinnipati.

Footer