Published on September 28, 2023 by Ana Beatriz Macedo  

This study highlights the potential of machine learning in predicting NBA MVP contenders, supporting data-driven decisions in sports analytics.

   First of all, we aimed to predict NBA Most Valuable Player (MVP) contenders using four machine learning models trained on data from different decades. The models were evaluated based on their ability to predict the top 10 and top 5 MVP contenders, and if the model correctly predict their exact positions on the final voting raking. We also conducted out-of-time data testing for the seasons 2021-22 and 2022-23 to assess the models' generalization and real-world applicability.

The data used was NBA per-game statistics and MVP voting data for players from 1991 to 2021. The dataset was divided into four subsets based on the respective decades: 1991-1999 (the 90s), 2000-2009 (early 2000s), 2010-2021 (2010s), and 1991-2021 (whole dataset). Each subset was preprocessed to ensure consistency and remove any irrelevant or incomplete entries.

Random Forest was the model utilized for all four models due to its ability to handle complex datasets and provide reliable predictions.

  • Model 1: Trained on data from 1991 to 2021 (whole dataset).
  • Model 2: Trained on data from the 90s (1991-1999).
  • Model 3: Trained on data from the early 2000s (2000-2009).
  • Model 4: Trained on data from 2010 to 2021 (2010s).

In regards to performance evaluation it was created some metrics to select the best hyperparameters for each machine learning model to ensure that we would be able to attempt their best scores. The evaluation metrics considered were:

  1. Prediction of the top 5 MVP contenders for each season.
  2. Checking if the top 5 MVP were actually predicted in their actual rank position of the final voting in their specific season.
  3.  Prediction of the top 10 MVP contenders for each year.
  4. Checking if the Top 5, even if they were not correctly predicted in Top 5, were, at least, predicted in the Top 10.

scores info

After settling the which would the best hyperparametrs for each model, each one was saved with its respective parameters and then tested on their respective training, testing and validation seasons. Also, the features that had the biggest influence on the model were the points per game, field goals per game made and free throws per game made.

Now,  here are some overall results from each model:

Specific decades

Nevertheless, out of the 3 decades models, the most recent decade performed the best. So, why not try to apply the recent model on earlier decades to see if it can outperferm their specific decades models.

Applied to all

It did generate overall results for earlier decades in comprarion to their specific algortihms when we take a look at the average. It is not much, but the recent decade model was able to capture a little more corrected players season by season and it endedp up surpassing the previous models.

In addition to that,  it was time to assess the models' generalization to new seasons and check if it could predict the NBA MVP contenders for the 2021-22 and 2022-23 season.

OOT data

It demonstrated excellent performance, correctly predicting the top 10 MVP contenders for both seasons. Furthermore, it outperformed the other models by accurately predicting Joel Embiid as the MVP for the 2022-23 season, which was later confirmed as he indeed won the MVP award that season.

In conclusion, the model trained on recent NBA seasons demonstrates superior performance and generalization capabilities, even when predicting players from past decades using recent data. It showcases robustness in capturing current player trends and performances. The extended study emphasizes the significance of training models on more recent data to achieve accurate predictions for contemporary NBA MVP contenders. Conversely, models trained on past specific decades efficiently predict top 10 MVP contenders within those eras but may face limitations in adapting to newer seasons.

About the Author

Ana Beatriz Macedo is a Data Science and AI undergraduate student in Brazil. Plans on pursuing and developing a career in Sports Analytics and Data Science. Favorite leagues are the NBA, NFL, and EPL. Here is a link to her LinkedIn profile, Portfolio and Data Viz Twitter account