Netflix Rating Prediction Competition

4 min readSep 22, 2022

Even before Kaggle was launched, Netflix held a competition and announced 1 million as the price money.This happened in the year October 2, 2006. And this competition went on for 3 years before someone actually won and led to development of many research and new topics in the field of recommender systems. (Maybe this is how Google got an idea for kaggle, who knows).

The core idea of this competition was to come up with a collaborative filtering algorithm to predict the user rating for a given film with no personal information about the user. This competition led to many research papers and new techniques in the field of recommender systems. At it’s base this is a recommendation problem.

Why did Netflix want to predict the rating that a user will give to a movie ? Simple, to recommend movies to them based on the rating given by this recommender system.

And how did Netflix select the winner, like what was the benchmark ? In the 2000’s Netflix used a recommender system called Cinematch. This algorithm had a pretty good run and most of the people who rented the DVD’s this algorithm predicted gave 5 star rating to the movie. Wait did I just say, “rented” and “DVD’s” ? Yes I did,because netflix started of as a DVD renting company. Coming back to our question of what is the bench mark, cinematch algorithm had a RMSE of 0.9525 and anyone who can beat this by 10% was considered the winner and awarded 1 million dollars.

Just 10% ? Sounds easy, well no. And that’s why the competition went on for 3 years. Netflix productionized one of the solutions of the “Progress Price Winners”. A progress price was given to the best solution of that year, that had atleat 1% increase than previous year. The model Netflix Productionized used Matrix Factorization and Restricted Boltzmann Machines (RBM).

Fun Fact — Netflix never productionized the algorithm that won the grand price of 1 million dollar due to engineering effort required. Well that’s a important lesson there, real-life models need to be scallable.

Enough with the history lesson, let’s focus on what’s important. The algorithm that won the competition. We will dive deep into what’s matrix factorization and how the winners handled the problem.

They divided the total rating that is given to a movie by a user into sub parts. The 4sub-parts were baseline rating, User-specific rating, Movie-specific rating, Specific interaction rating.

Let me explain each one seperately.

Baseline rating : It’s the average or mean rating tthat all users gave to a specific movie. This rating is movie specific.

User-specific rating : This is specific to the user. There can be certain users who’re critical and doesn’t give high rating at all. So this specific user moslty rates movies 0.5 to 1 lesser than other user’s we have to consider this.

Movie-specific rating : For example, consider the movie Titanic, it’s rating will be possibly higher than avergae rating of other movie’s we need to consider this effect.

Specific interaction rating — Let’s consider a user A who likes psychothrillers, or scifi movies. This user might not like romcom movies, he might give a pretty low rating to “The Proposal” which is pretty good movie. This is specific to the preference of the user.

The winners used a GBDT, gradient boosted decision tree model. They combined over 500 models, as I specified before this model wasn’t productionized and I guess now you know why, the trade-off to the RMSE and the cost of prouductionization had to be taken into account.

The main take away from this article is how they divided the rating into sub-parts and focused on each effect to amplify the prediction and MF methods.

One of the best come outs of this competetion was Matrix Factorization (MF) methods.

Let’s see what it is and why is it used in Recommender Systems.

MF can be viewed as a collaborative filtering method. Assuming you kow what’s collaborative filtering, given user matrix and movie matrix, MF comes up with 2 matrixes, let us consider these 2 other matrixes to be U and R. U represents the relationship between k features and users, while R will represent the relationship between features k and rating.

It will be such that given i*j(row*col) user-rating matrix A, we decompose it into U matrix of i*k(user*feature) and R matrix of j*k(rating*feature). If we multiply L matrix with the transpose of M matrix we will get the original user-rating matrix.This method is called Non-Negative Matrix Factorization(NMF).

MF can be done in 2 ways, using NMF , as we saw above and SVD, like we will be seeing below.

But in simple words MF is a matrix decomposition method.

SVD, Singular Value decomposition, does the same except for the fact that we get an additional singular value, which gives us the strength of the features, which is also called latent features.Here U and V represents user-features matrix and rating-features matrix respectively.Where Sigma is a diagonal matrix with square roots of the eigenvalues of MTM.This Sigma is the singular value which gives the strength of the features.

And that’s how the Netflix price winners did it !

Netflix Rating Prediction Competition

Written by Raghavi_bala