Ensemble modelling means using more than one model for your predictions. Even though this increases the complexity, this approach has been shown to generate strong results.
To see this in action, take a look at the “Netflix Prize,” which began in 2006. The company announced it would pay $1 million to anyone or any team that could improve the accuracy of its movie recommendation system by 10% or more. Netflix also provided a dataset of over 100 million ratings of 17,770 movies from 480,189 users.16 There would ultimately be more than 30,000 downloads.
Why did Netflix do all this? A big reason is that the company’s own engineers were having trouble making progress. Then why not give it to the crowd to figure out? It turned out to be quite ingenious—and the $1 million payout was really modest compared to the potential benefits.
The contest certainly stirred up a lot of activity from coders and data scientists, ranging from students to employees at companies like AT&T.
Netflix also made the contest simple. The main requirement was that the teams had to disclose their methods , which helped boost the results (there was even a dashboard with rankings of the teams).
But it was not until 2009 that a team—BellKor’s Pragmatic Chaos—won the prize. Then again, there were considerable challenges.
So how did the winning team pull it off? The first step was to create a baseline model that smoothed out the tricky issues with the data. For example, some movies only had a handful of ratings, whereas others had thousands. Then there was the thorny problem where there were users who would always rate a movie with one star. To deal with these matters, BellKor used machine learning to predict ratings in order to fill the gaps.
Once the baseline was finished, there were more tough challenges to tackle like the following:
- A system may wind up recommending the same films to many users.
- Some movies may not fit well within genres. For example, Alien is really a cross of science fiction and horror.
- There were movies, like Napoleon Dynamite, that proved extremely difficult for algorithms to understand.
- Ratings of a movie would often change over time.
The winning team used ensemble modelling, which involved hundreds of algorithms. They also used something called boosting, which is where you build consecutive models. With this, the weights in the algorithms are adjusted based on the results of the previous model, which help the predictions get better over time (another approach, called bagging, is when you build different models in parallel and then select the best one).
But in the end, BellKor found the solutions. However, despite this, Netflix did not use the model! Now it’s not clear why this was the case. Perhaps it was that Netflix was moving away from five-star ratings anyway and was more focused on streaming. The contest also had blowback from people who thought there may have been privacy violations.
Regardless, the contest did highlight the power of machine learning—and the importance of collaboration.

Leave a Reply