Real-time Personalization using Embeddings for Search Ranking
Last updated
Last updated
For a given search result on Airbnb, users still have to browse through thousands of listing to find the house they want to rent. We need a search ranking approach to surface the most relevant listings to the users in real-time.
Suppose we have a list of candidates, , we want to rank them
The current search ranking model uses 100+ features
Listing Features: price, reviews, capacity
Query Features: destination, number of nights stay, number of guests
Guest Features: past bookings, price preferences, short term click/skip history)
In this paper, the authors proposed a new real-time personalization features to the ranking model using short term user interactions, such as clicks and skips.
Airbnb is considered a two-sided marketplace where the search results need to be optimized for sellers and buyers.
In the case of Airbnb, there is a clear need to optimize results for both hosts and guets, meaning that given a ninput query with location and trip dates we need to rank high listings whose location, price, style, reviews, etc. are appealing to the guest and at the same time, are a good match in terms of host preferences for trip duration and lead days.
Guests typically conduct multiple searches before booking. They may click into more than one listing and contact different hosts before deciding where to stay. We can use these in-session signals, such as clicks, host contacts, etc. for real-time personalization.
The aim is to show to the guest more of the listings similar to the ones we think they liked since starting the search session. At the same time, we can use the negative signals to show the guest less of the listings similar to the ones we think they did not like.
In addition to Real-time Personalization using immediate user actions, we introduce another type of embeddings trained on bookings to be able ot capture user's long term interest. Due to the nature of travel business, where users travel 1-2 times per year on average, bookings are a sparse signal, with a long tail of users with a single booking. To tackle this, we propose to train embeddings at a level of user type, instead of a particular user ID, where type is determined using many-to-one rule-based mapping that leverages known user attributes. At the same time we learn listing type embeddings in the same vector space as user type embeddings. This enables us to calculate similarities between user type embedding of the user who is conducting a search and listing type embeddings of candidate listings that need to be ranked.
This paper is taking a NLP approach toward embeddings. In NLP, embedding models are trained by directly taking into account the word order and their co-occurrence, based on the assumption that words frequently appearing together in the sentences also share more statistical dependence. Taking this idea one step further, we can use user interactions as context to train item embeddings based on the assumption that users tend to click on similar listings for a specific search purpose.
Researchers from Web Search, E-commerce, and Marketplace domains have quickly realized that just like one can train word embeddings by treating a sequence of words in a sentence as context, same can be done for training embeddings of user actions, e.g. items that were clicked or purchased, queries and ads that were clicked, by treating sequence of user actions as context.
There are two distinct approaches
Listing embeddings for short-term real-time personalization
User type & listing type embeddings for long term personalization
Basically, it models temporal context of listing click sequences, where listings with similar contexts will have similar representations.
Negative Sampling
The time required to compute the gradient of the objective function is proportional to the vocabulary size $V$, which for large vocabularies, e.g. several million listing IDs, is an infeasible task.
The optimization objective becomes:
In other words, it is maximizing the probability of clicked listing given its positive neighbors, minimize the probability of clicked listing given the sampled negative neighbors.
Booked Listing as Global Context
Booked Sessions, i.e. click sessions that end with user booking a listing to stay at.
Exploratory Sessions, i.e. click session that do not end with booking.
Both are useful for capturing contextual similarity, however booked sessions can be used to adapt the optimization such that at each step, we predict not only the neighboring clicked listings but the eventually booked listing as well. This adaption can be achieved by adding booked listing as global context, such that it wil always be predicted no matter if it is within the context window or not.
Adapting Training for Congregated Search
Cold-start Listing Embeddings by Averaging Neighbors
Upon listing creation, the host is required to provide information about the listing, such as location, price, listing type, and etc... We use the provided meta-data about the listing to find 3 geographically closest listing within a 10 miles radius that have embeddings, are of the same listing type as the new listing (e.g. $20-$25 per night). We use the average of the 3 vectors to form the new listing embedding.
Given a user who has made past bookings in New York and London, it would be useful to recommend listings that are similar to those previously booked ones.
While some cross-market similiarities are captured in listing embeddings trained using clicks, a more principla way of learning such cross-market similarities would be to learn from sessions constructed of listings that a particular user booked over time.
It'd be challenging to learn embeddings for each listing using this booking session dataset because the data are far too sparse. Booking is much less frequent event than clicking. Also most users don't book more than 5 times on Airbnb. The contextual information is too little. Lastly, long time intervals may pass between two consecutive bookings by users. The users may change preference drastically due to career changes, family situation and etc...
To addresss these very common marketplace problems in practice, we propose to learn embeddings at a level of listing type instead of listing ID. Given meta-data available for a certain listing ID such as location, price, listing type, capacity, number of etcs and etc..., we use a rule-based mapping to determine its listing type.
In other words, manually map the listing to a category using attributes like
Number of bookings
Price per night
Price per guest
Capacity
Number of reviews
Listing 5 stars
Number of beds, bathrooms, bedrooms
Many listings will map into the same listing type. Instead of learning an embedding for a listing, now the embedding is done on the type. We now have enough data from the booking session dataset to cover listing type embeddings.
To account for user ever-changing preferences over time, we propose to learn user type embeddings in the same vector space as listing type embeddings. The user type is determined using a similar procedure we applied to listings, i.e. by leveraging metadata about user and their previous bookings.
Same procedure here, map user to a category using attributes like
Number of bookings
Price per night spent
Price per guest spent
Capacity needed
Training Procedure
Explicit Negatives for Rejections
Unlike clicks that only reflect guest-side preferences, bookings reflect host-side preferences as well. Some of the reasons for host rejections are bad guest star ratings, incomplete or empty guest profile, no profile picture and etc. These characteristics are part of the user type information.
Host rejections can be utilized during training to encode the host preference signal in the embedding space ain addition to the guest preference signal. The whole purpose of incorporating the rejection signal is that some listing types are less sensitive to user type with no bookings, incomplete profiles and less than average guest star ratings than others, and we want the embeddings of those listing types and user t ypes to be closer in the vector space.
We are given a set of click sessions obtained from N
users, where each session is defined as an uninterrupted sequence of M listing IDs that were clicked by the user. A new session is started whenever there is a time gap of more than 30 minutes between two consecutive user clicks.
The aim is to learn a d
-dimensional real-valued representation of each unique listing , such that similar listings lie nearby in the embedding space.
The loss objective of the model is to learn listing representation using skip-gram model by maximizing over the entire set of search sessions.
The probability of observing a listing from the contextual neighborhood of clicked listing is defined using softmax.
and are the input and output vector representations of listing .
Hyperparameter is defined as a length of the relevant forward looking and backward looking context (neighborhood) for a clicked listing.
is a vocabulary defined as a set of unique listing IDs in the dataset.
We need to use negative sampling to reduce the computational complexity. Negative sampling can be formulated as follows. We generate a set of of positive pairs of clicked listings, and their contexts (i.e. clicks on other listings by the same user that happened before and after click on listings within a window of length ), and a set of of negative pairs of clicked listings and randomly sampled listings from the entire vocabulary.
We can break down the click sessions into
where is the embedding of the booked listing . For exploratory sessions, the updates are still conducted by previous optimizing objective.
Listing embeddings are learned from booked sessions using a sliding window of size 2n+1
that slides from the first cicked listing to the booked listing. At each step the embedding of the central listing is being updated such that it predicts the embeddings of the context listings from and the booked listing . As the window slides, some listings fall in and out of the context set, while the booked listing always remain within it as global context.
Users of online travel booking sites typically search only within a single market. As a consequence, there is a high probability that contains listings from the same market. On the other hand, due to random sampling of negatives, it is very likely that contains mostly listings that are not from the same markets as listing in .
At each step, given a central listing , the positive context mostly consist of listings listings that are from the same market as , while the negative context mostly consists of listings that are not from the same market as . This imbalance leads to learning sub-optimal-within-market similarities. It merely drawing a separation between markets, which is not that helpful on predicting actual similarities.
To address this issue, we need to add a set of random negatives , sampled from the market of the central listing .
Everyday new listings are created by hosts and made available on Airbnb. When new listings are added, they don't have any embeddings because they were never present in the click sessions training data. To create embeddings for new listings, we need to utilize existing embeddings of other listings.
Suppose we are given a set of booking sessions obtained from N
users, where each booking session is defined as a sequence of listings booked by user j
ordered in time.
To learn user type and listing type embedings in the same vector space, we incorporate user types into the booking sessions. Now the of booking sessions from users are tuples of user type to listing type instead of user ID to listing ID.
The objective that needs to be opitmized is similar to the listing embeddings from previous session. Instead of listing , the center item needs to be updated is either user type or listing type depending which one is caught in the sliding window.