Real-time Personalization using Embeddings for Search Ranking

Source

Problem Statement

For a given search result on Airbnb, users still have to browse through thousands of listing to find the house they want to rent. We need a search ranking approach to surface the most relevant listings to the users in real-time.

Suppose we have a list of candidates, c1,c2,c3,...c_1, c_2, c_3,..., we want to rank them

f(c1,c2,c3,...)ranked list  c1,c2c3,...f(c_1, c_2, c_3, ...) \rightarrow \text{ranked list}\;c^\prime_1, c^\prime_2 c^\prime_3, ...

The current search ranking model uses 100+ features

  • Listing Features: price, reviews, capacity

  • Query Features: destination, number of nights stay, number of guests

  • Guest Features: past bookings, price preferences, short term click/skip history)

In this paper, the authors proposed a new real-time personalization features to the ranking model using short term user interactions, such as clicks and skips.

Introduction

Airbnb is considered a two-sided marketplace where the search results need to be optimized for sellers and buyers.

In the case of Airbnb, there is a clear need to optimize results for both hosts and guets, meaning that given a ninput query with location and trip dates we need to rank high listings whose location, price, style, reviews, etc. are appealing to the guest and at the same time, are a good match in terms of host preferences for trip duration and lead days.

Guests typically conduct multiple searches before booking. They may click into more than one listing and contact different hosts before deciding where to stay. We can use these in-session signals, such as clicks, host contacts, etc. for real-time personalization.

The aim is to show to the guest more of the listings similar to the ones we think they liked since starting the search session. At the same time, we can use the negative signals to show the guest less of the listings similar to the ones we think they did not like.

In addition to Real-time Personalization using immediate user actions, we introduce another type of embeddings trained on bookings to be able ot capture user's long term interest. Due to the nature of travel business, where users travel 1-2 times per year on average, bookings are a sparse signal, with a long tail of users with a single booking. To tackle this, we propose to train embeddings at a level of user type, instead of a particular user ID, where type is determined using many-to-one rule-based mapping that leverages known user attributes. At the same time we learn listing type embeddings in the same vector space as user type embeddings. This enables us to calculate similarities between user type embedding of the user who is conducting a search and listing type embeddings of candidate listings that need to be ranked.

This paper is taking a NLP approach toward embeddings. In NLP, embedding models are trained by directly taking into account the word order and their co-occurrence, based on the assumption that words frequently appearing together in the sentences also share more statistical dependence. Taking this idea one step further, we can use user interactions as context to train item embeddings based on the assumption that users tend to click on similar listings for a specific search purpose.

Researchers from Web Search, E-commerce, and Marketplace domains have quickly realized that just like one can train word embeddings by treating a sequence of words in a sentence as context, same can be done for training embeddings of user actions, e.g. items that were clicked or purchased, queries and ads that were clicked, by treating sequence of user actions as context.

Methodology

There are two distinct approaches

  • Listing embeddings for short-term real-time personalization

  • User type & listing type embeddings for long term personalization

Listing Embeddings

We are given a set of click sessions obtained from N users, where each session s=(l1,l2,...,lM)Ss = (l_1, l_2, ..., l_M) \in S is defined as an uninterrupted sequence of M listing IDs that were clicked by the user. A new session is started whenever there is a time gap of more than 30 minutes between two consecutive user clicks.

The aim is to learn a d-dimensional real-valued representation vliRdv_{l_i} \in \mathbb{R}^d of each unique listing lil_i, such that similar listings lie nearby in the embedding space.

The loss objective of the model is to learn listing representation using skip-gram model by maximizing LL over the entire set SS of search sessions.

L=ΣsSΣlis(Σmjm,i0logP(li+jli))L = \Sigma_{s \in S} \Sigma_{l_i \in s} \left (\Sigma_{-m \leq j \leq m, i\neq 0} log P(l_{i+j} \mid l_{i}) \right)

The probability of observing a listing li+jl_{i+j} from the contextual neighborhood of clicked listing lil_i is defined using softmax.

P(li+jli)=exp(vlivli+j)Σl=1Vexp(vlivl)P(l_{i+j} \mid l_i) = \frac{exp(v_{l_i} \cdot v_{l_{i+j}}^\prime)}{\Sigma_{l=1}^V exp(v_{l_i} \cdot v_l^\prime)}
  • vlv_l and vlv_l^\prime are the input and output vector representations of listing ll.

  • Hyperparameter mm is defined as a length of the relevant forward looking and backward looking context (neighborhood) for a clicked listing.

  • VV is a vocabulary defined as a set of unique listing IDs in the dataset.

Basically, it models temporal context of listing click sequences, where listings with similar contexts will have similar representations.

Negative Sampling

The time required to compute the gradient of the objective function is proportional to the vocabulary size $V$, which for large vocabularies, e.g. several million listing IDs, is an infeasible task.

We need to use negative sampling to reduce the computational complexity. Negative sampling can be formulated as follows. We generate a set of DpD_p of positive pairs of clicked listings, and their contexts cc (i.e. clicks on other listings by the same user that happened before and after click on listings ll within a window of length mm), and a set of DnD_n of negative pairs of clicked listings and nn randomly sampled listings from the entire vocabulary.

The optimization objective becomes:

argmax  Σl,cDplog11+evcvl+Σl,cDnlog11+evcvl\text{argmax}\; \Sigma_{l,c \in D_p} log \frac{1}{1 + e^{-v^\prime_c v_l}} + \Sigma_{l,c \in D_n} log \frac{1}{1 + e^{v^\prime_c v_l}}

In other words, it is maximizing the probability of clicked listing given its positive neighbors, minimize the probability of clicked listing given the sampled negative neighbors.

Booked Listing as Global Context

We can break down the click sessions SS into

  1. Booked Sessions, i.e. click sessions that end with user booking a listing to stay at.

  2. Exploratory Sessions, i.e. click session that do not end with booking.

Both are useful for capturing contextual similarity, however booked sessions can be used to adapt the optimization such that at each step, we predict not only the neighboring clicked listings but the eventually booked listing as well. This adaption can be achieved by adding booked listing as global context, such that it wil always be predicted no matter if it is within the context window or not.

argmax  Σl,cDplog11+evcvl+Σl,cDnlog11+evcvl+log11+evlbvl\text{argmax}\; \Sigma_{l,c \in D_p} log \frac{1}{1 + e^{-v^\prime_c v_l}} + \Sigma_{l,c \in D_n} log \frac{1}{1 + e^{v^\prime_c v_l}} + log \frac{1}{1 + e^{-v^\prime_{l_b} v_l}}

where vlbv_{l_b} is the embedding of the booked listing lbl_b. For exploratory sessions, the updates are still conducted by previous optimizing objective.

Listing embeddings are learned from booked sessions using a sliding window of size 2n+1 that slides from the first cicked listing to the booked listing. At each step the embedding of the central listing vlv_l is being updated such that it predicts the embeddings of the context listings vcv_c from DpD_p and the booked listing vlbv_{l_b}. As the window slides, some listings fall in and out of the context set, while the booked listing always remain within it as global context.

Adapting Training for Congregated Search

Users of online travel booking sites typically search only within a single market. As a consequence, there is a high probability that DpD_p contains listings from the same market. On the other hand, due to random sampling of negatives, it is very likely that DnD_n contains mostly listings that are not from the same markets as listing in DpD_p.

At each step, given a central listing ll, the positive context mostly consist of listings listings that are from the same market as ll, while the negative context mostly consists of listings that are not from the same market as ll. This imbalance leads to learning sub-optimal-within-market similarities. It merely drawing a separation between markets, which is not that helpful on predicting actual similarities.

To address this issue, we need to add a set of random negatives DmnD_{m_n}, sampled from the market of the central listing ll.

argmax  Σl,cDplog11+evcvl+Σl,cDnlog11+evcvl+log11+evlbvl+Σl,mnDmnlog11+evmnvl\text{argmax}\; \Sigma_{l,c \in D_p} log \frac{1}{1 + e^{-v^\prime_c v_l}} + \Sigma_{l,c \in D_n} log \frac{1}{1 + e^{v^\prime_c v_l}} + log \frac{1}{1 + e^{-v^\prime_{l_b} v_l}} + \Sigma_{l,m_n \in D_{m_n}} log \frac{1}{1 + e^{v^\prime_{m_n} v_l}}

Cold-start Listing Embeddings by Averaging Neighbors

Everyday new listings are created by hosts and made available on Airbnb. When new listings are added, they don't have any embeddings because they were never present in the click sessions SS training data. To create embeddings for new listings, we need to utilize existing embeddings of other listings.

Upon listing creation, the host is required to provide information about the listing, such as location, price, listing type, and etc... We use the provided meta-data about the listing to find 3 geographically closest listing within a 10 miles radius that have embeddings, are of the same listing type as the new listing (e.g. $20-$25 per night). We use the average of the 3 vectors to form the new listing embedding.

UIser-type & Listing-type Embeddings

Given a user who has made past bookings in New York and London, it would be useful to recommend listings that are similar to those previously booked ones.

While some cross-market similiarities are captured in listing embeddings trained using clicks, a more principla way of learning such cross-market similarities would be to learn from sessions constructed of listings that a particular user booked over time.

Suppose we are given a set SbS_b of booking sessions obtained from N users, where each booking session sb=(lb1,lb2,...,lbM)s_b = (l_b1, l_b2, ..., l_bM) is defined as a sequence of listings booked by user j ordered in time.

It'd be challenging to learn embeddings for each listing using this booking session dataset because the data are far too sparse. Booking is much less frequent event than clicking. Also most users don't book more than 5 times on Airbnb. The contextual information is too little. Lastly, long time intervals may pass between two consecutive bookings by users. The users may change preference drastically due to career changes, family situation and etc...

To addresss these very common marketplace problems in practice, we propose to learn embeddings at a level of listing type instead of listing ID. Given meta-data available for a certain listing ID such as location, price, listing type, capacity, number of etcs and etc..., we use a rule-based mapping to determine its listing type.

In other words, manually map the listing to a category using attributes like

  • Number of bookings

  • Price per night

  • Price per guest

  • Capacity

  • Number of reviews

  • Listing 5 stars

  • Number of beds, bathrooms, bedrooms

Many listings will map into the same listing type. Instead of learning an embedding for a listing, now the embedding is done on the type. We now have enough data from the booking session dataset to cover listing type embeddings.

To account for user ever-changing preferences over time, we propose to learn user type embeddings in the same vector space as listing type embeddings. The user type is determined using a similar procedure we applied to listings, i.e. by leveraging metadata about user and their previous bookings.

Same procedure here, map user to a category using attributes like

  • Number of bookings

  • Price per night spent

  • Price per guest spent

  • Capacity needed

Training Procedure

To learn user type and listing type embedings in the same vector space, we incorporate user types into the booking sessions. Now the SbS_b of NbN_b booking sessions from NN users are tuples of user type to listing type instead of user ID to listing ID.

The objective that needs to be opitmized is similar to the listing embeddings from previous session. Instead of listing ll, the center item needs to be updated is either user type utu_t or listing type ltl_t depending which one is caught in the sliding window.

Explicit Negatives for Rejections

Unlike clicks that only reflect guest-side preferences, bookings reflect host-side preferences as well. Some of the reasons for host rejections are bad guest star ratings, incomplete or empty guest profile, no profile picture and etc. These characteristics are part of the user type information.

Host rejections can be utilized during training to encode the host preference signal in the embedding space ain addition to the guest preference signal. The whole purpose of incorporating the rejection signal is that some listing types are less sensitive to user type with no bookings, incomplete profiles and less than average guest star ratings than others, and we want the embeddings of those listing types and user t ypes to be closer in the vector space.

Last updated