Neural Collaborative Filtering

Collaborative filtering is traditionally done with matrix factorization. I did my movie recommendation project using good ol' matrix factorization. However, recently I discovered that people have proposed new ways to do collaborative filtering with deep learning techniques! There's a paper, titled Neural Collaborative Filtering, from 2017 which describes the approach to perform collaborative filtering using neural networks.

In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback.

Here's the high level idea.

ncf-high-level

Embeddings

We perform embedding for each user and item(movie). The embedding layer is simply a matrix dot product of one hot encoding of a user/movie and the embedding weights. Let's put it concretely. Suppose I have ten users, the one hot encoding of each users will look like the following.

Now I need an embedding weight matrix which will map a user or movie to an embedding vector. Let's define the embedding matrix to be a matrix of shape (N, D) where N is the number of users or movies and D is the latent dimension of embedding.

The embedded vectors will then be fed into a deep neural network and its objective is to predict the rating from a user given to a movie. For example, user 1 may rate movie 1 with five stars. The network should be able to predict that after training.

Dataset: Movie Ratings

Before we dive into the details of the architecture, let's take a look at the datasets that we will be using for this exercise. I got the data from MovieLens. I will use the small dataset with 100,000 movie ratings.

user_id
movie_id
rating
timestamp

70647

492

7625

4.0

898107789

29726

213

2029

4.0

1462639144

60508

439

956

3.0

1041115006

79013

547

4826

4.0

1022676793

65442

466

2176

2.0

945836678

user_id
movie_id
rating
timestamp

78939

547

1966

4.5

1093832403

28074

205

3576

4.0

1442152429

27716

201

178

5.0

1110498354

20913

144

76

3.0

837455700

58482

425

416

0.5

1112537722

Generalized Matrix Factorization (GMF)

In the context of the paper, a generalized matrix factorization can be described by the following equation.

y^ui=a(hT(puâ‹…qi))\hat{y}_{ui} = a\left(h^{T}(p_{u} \cdot q_{i})\right)

where aa is an activation function and hh is the edge weight matrix of the output layer. The edge weight matrix can be seen as an additional weight to the layer.

Matrix Factorization

If we use an identity function for activation and enforce the edge weight matrix to be a uniform vector of 1, we can exactly recover the standard matrix factorization model.

svg

In the model above, we are not using any activation function and there is no additional weight to layer. The model above represents a classic matrix factorization. It takes two inputs, a user ID and a movie ID. The inputs are embedded into (1, 5) vectors. The vectors are then flattened. The dot product of the flattened vectors is the predicted rating.

We can go a little further by making it a non-negative matrix factorization by adding a non-negativity constraints on embeddings.

Neural Network with MF

Now let's add some non-linearities to make it non-linear matrix factorization, which is essentially appending a neural network to the end of the model.

svg
png

Multi-Layer Perceptron

The paper proposes a slightly different architecture than the one I showed above. Previously, we have already covered what is a generalized matrix factorization model. It's just a framing the original matrix factorization technique in a neural network architecture. Now we take a step even further to create two pathways to model users and items interactions. The multi-layer perceptron is essentially a deep neural network similar to what is shown above, except now we will take it out and put it into a separate path way instead of appending it to the end of the vanilla matrix factorization.

Here's a straightforward approach, quoted directly from the paper.

Let GMF and MLP share the same embedding layer and then combine the outputs of their interactive functions. This way shares a similar spirit with the well-known Neural Tensor Network (NTN). Specifically, the model for combining GMF with a one-layer MLP can be forumated as.

y^ui=σ(hTa(pu⋅qu+W[puqu]+b))\hat{y}_{ui} = \sigma\left(h^{T}a\left(p_{u} \cdot q_{u} + W\begin{bmatrix}p_{u}\\q_{u}\end{bmatrix}+b\right)\right)

However, the authors believed that sharing the embeddings of GMF and MLP might limit the performance of fused model. They want to provide more flexibility to the model and allow GMF and MLP to learn separate embeddings. They will combine the two models by concatenating the last hidden layer.

ncf-architecture
svg

Last updated