Published on August 4, 2023 by Rohan Tummala  
2009 NBA Playoffs

"2009 NBA Playoffs" by RMTip21 is licensed under CC BY-SA 2.0 .

            Have you ever wondered how sentiment towards different NBA players can change throughout the course of a season? What is the public consensus regarding certain players and how do others perceive them? This project explores data from reddit in order to better understand the public opinion of different players in the NBA. The exploration included creating word clouds for players as well as tracking how different words were used in relation to players over time.

Data and Preprocessing

In order to scrape the necessary data from reddit, the API wrapper PRAW was used, which allows users to gather posts across any subreddits. PRAW allows for sorting by various features, which included hot, new, top, and controversial posts. It is important to note that without specifying sorting features, there would have been lots of recency bias regarding players. In order to get a more balanced set of old and new comments, sorting by ‘top’ posts ended up being the best option as it gave the titles of posts with the most interactions over a specified time period. This function was ran for several players and both the date and title of the post were added to a pandas dataframe for each player. An example for Jimmy Butler is shown. A total of 1000 titles were scraped for each player.

 

example data from reddit

Once the data had been scraped, it was necessary to perform some data manipulations before creating the visualizations. The first was to remove all “stop words” from the titles, which are just words that do not carry much value or meaning, such as “and”, “is” or “the”. A list of such words was compiled and filtered out of the data. In addition to this, all text was converted to lowercase so as to not have any duplicate words in the word cloud. Finally, all words that had the same root were stemmed, which meant that all variations of the same word were reduced down to their base. For example, “passing” was reduced down to “pass” using the nltk library. 

Word Cloud Generation

With the data being fully prepared, it was time to complete analysis and create the visualizations. The plan was to create word clouds for some of the most discussed players in the 2023 NBA Playoffs as well as graph how often certain words were used to describe them over the course of the postseason. The players that were examined were Jimmy Butler, Nikola Jokic, Jaylen Brown, and James Harden. All four of these players at some point in the playoffs were hotly debated over which is why they were chosen to study. For Butler and Jokic, it was hypothesized that words regarding playoff greatness would most commonly be used. For Brown and Harden, it was predicted that there would be much more controversy regarding them. To test this theory,  word clouds for each of the players were generated using the data that was described earlier.

 

word clouds from reddit data

 

As can be seen from the word clouds, the words used to describe Butler and Jokic are more positive, with a larger focus on the history they were making in the playoffs and their impressive statistics, as evidenced by “triple double” and “history” being some of the largest words in the clouds. Overall, this matched up well with the original hypothesis. For Brown and Harden, although there weren’t many “negative” words as the hypothesis predicted, there was a lot of talk about referees, trades, and contracts. 

Sentiment over Time

The second part of this project was to determine how the use of certain words to describe players changed over the course of the playoffs. These words were chosen based on the results of the word cloud visualizations, picking ones that appeared most prominent and interesting to examine. Counting the number of times the words “history” or “historic” appeared in reddit post titles in relation to Jimmy Butler and Nikola Jokic was the first step. A dataframe was created to hold these values, and instances when the words were used in the same week were grouped together. In this way, the following charts were created.

 

jokic tracker

 

 

butler tracker

 For Nikola Jokic, the graph shows that the word “history” was used most in mid to late May, which makes sense as this was during his series against the Los Angeles Lakers in which he broke multiple playoff records while stuffing the stat sheet with triple doubles. For Jimmy Butler, the peak was during his series against the Bucks which also makes sense as he broke several Miami Heat playoff records for points in this 5 game span. It is interesting to note the lack of this word in posts regarding Jokic in the early stages of the playoffs, when he was arguably playing even better than in the short 4 game series against the Lakers where he did not need to do much. This could possibly be explained by a lack of media attention as they were playing the Timberwolves and Suns compared to the Los Angeles Lakers in late May. 

For Jaylen Brown,  multiple terms were tracked throughout the playoffs, and these words included “contract”, “money”, and “trade”. Below is the result:

 

jaylen tracker

Most of the talk regarding his impending contract decision and a possible trade came after Game 7 of the Eastern Conference Finals against the Miami Heat, which can be explained by his poor performance in this game. The spike in trade talks during the week of May 8th is interesting however, as the Celtics were playing well during this time period. 

Finally, the word “foul” was tracked for James Harden given its prominence in the word cloud. By doing this, the goal was to see if people attributed his poor performance and the 76ers loss in Game 7 versus the Celtics to a lack of free throws or foul calls. Below is the graph:

 

harden tracker

As the results show, there is indeed a peak in the usage of the word soon after the 76ers had been eliminated from the playoffs.

Insights and Conclusions

Based on both the word clouds and word tracking charts for each of these players, the hypotheses were mostly correct for both pairs of players. As predicted, among the most common words used to describe Jokic and Butler were “history” and impressive statistics. Through the graphs of how the use of these words changed over the course of the playoffs, several insights were gained into how certain performances and series get different amounts of attention due to factors such as media coverage. For Brown and Harden, the prediction was correct that there would be controversy in discussion about them, particularly with trade rumors and talk regarding their respective contracts.

Future Steps

With more time, it would be best to broaden the dataset to include more social media platforms. In addition, it would be interesting to broaden out the scope to include full careers of players instead of just the most recent playoffs. This was attempted initially, but it ended up taking quite a while to run as there was no way to only scrape data from before a certain time period. Another question that came up was if it was possible to explore how words come up in posts in relation to one another. In other words, are there certain patterns between words used to describe players that otherwise seem unrelated?

About the Author

Rohan TummalaRohan Tummala is a current student at The Nueva School in San Mateo, CA with a deep interest in sports analytics. EmailLinkedIn