Discover Spence's insights, data modeling best practices, and his experiences in Paradime's 'NBA Data Modeling Challenge.'
Welcome to the "NBA Challenge Rewind" series 🙌
This blog series will showcase the “best of” submissions from Paradime’s NBA Data Modeling Challenges, highlighting the remarkable data professionals behind them.
If you’re unfamiliar with the NBA Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner’s announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "NBA Challenge Rewind" blog, you’ll discover:
Let’s check our sixth installment, exploring Chris Hughes and his submission!
Hey There! My name is Chris Hughes, a data analytics expert specializing in marketing, product, and operations strategies for various companies. I recently moved to LA, where I'm running Hughes Analytics, a consulting service that supports businesses with their data engineering and analytics needs.
I learned about the NBA Data Modeling Challenge through a LinkedIn post, and I saw it as a perfect opportunity to add a comprehensive analytics engineering project to my portfolio and to learn new tools like Paradime. Going into the challenge, my goal was to win, and I'm thrilled to share that I placed second and won a $1,000 Amazon gift card!
In this blog, I'll share the journey of building my project, as well as the insights I uncovered!
To tackle this challenge, I utilized a mix of required and optional data tools:
Required Tools:
Optional Tools:
Python for Predictive Modeling: I integrated Python to develop a predictive model for player salaries, aiming to determine if players were under or over-performing relative to their compensation.
Now, let’s take a look at how I built my project!
I began my project by identifying my primary audience: NBA General Managers. I brainstormed potential insights that could assist them in assembling the best teams possible, focusing on key areas such as:
Next, I conducted a thorough review of the seven historical NBA datasets provided by Paradime. These datasets generally supported my insights, but to enhance the depth of my analysis, I integrated two additional datasets:
Once I had gathered all the necessary data, I began constructing my dbt models in Paradime and visualizing the insights. However, before we explore these insights, let's first delve into the inevitable and necessary challenges I encountered before submitting my project.
Data projects invariably come with challenges, and the NBA Data Modeling Challenge was no exception.
One of the primary challenges I faced was managing my time effectively. With a multitude of potential insights to explore, prioritization became crucial. At one point, I discovered I was missing data that could have significantly enhanced my project. However, given the constraints, I decided to omit this aspect, as creating the API connection, cleaning the data, and integrating it with other data sources would have required more time than I had available.
Additionally, I allocated a substantial amount of time to developing one of my predictive models, Player Performance vs Salary. While this analysis proved valuable, it was only one component of the broader project. If I could do it over again, I would have allocated more of that time to other insights.
Another major challenge was crafting a story. I didn’t want my project to be just a collection of disparate insights; instead, I aimed to weave a compelling story where each analysis built upon the previous one. Achieving this required multiple iterations of data analysis, refining the written insights, and drawing clear, insightful conclusions. To see how I addressed these challenges and crafted a compelling narrative, check out the README.md section of my submission.
See how points per game trends over the course of a player’s career.
Insight: Players tend to peak in terms of points per game (PPG) between the ages of 28 and 30. Superstar and Legendary players generally maintain more consistency throughout their careers. However, Star and Role players often experience a decline after reaching the age of 32.
Approach: To analyze these trends, I utilized stg_player_game_logs.sql to aggregate each player's average PPG into player_game_logs_agg.sql. I then categorized each player into their respective experience cohort—such as Legend, Role Player, etc.—based on their average PPG, which I defined in dim_player_info.sql.
A look at the most overvalued players during the 2022-23 season.
Insight: John Wall emerged as the most overvalued player for the 2022-23 season. Despite earning a $47 million salary, the predictive models suggest his performance merited closer to $11 million. Interestingly, he was bought out of his contract and released at the end of the season, which aligns with our findings. Another high-profile case is Ben Simmons, who has been limited by injuries in recent years and also underperformed according to our metrics. Our analysis includes other notable players who may not provide the best value for their teams according to their current contracts.
Approach: I developed this analysis using dbt models fact_player_performance.sql, dim_player_salaries_by_season.sql. These models helped generate data that I fed into my predictive model salary_prediction_model.py which compares each player’s actual salary against their predicted salary based on performance metrics.
Insight: The analysis confirmed Michael Jordan's 1996 contract as the highest ever in NBA history, valued at $30 million at the time — unprecedented for any player. Adjusted for inflation and considering today's salary cap, this amount escalates to an astounding $280 million for the 2022-23 season. In contrast, the highest contract for an active player, belonging to Steph Curry, would be $58 million under similar adjustments, markedly less than Jordan's historic payout.
Approach: To conduct this analysis, I utilized stg_inflation_data.sql and stg_salary_cap_by_season.sql to adjust each player's historical salary to its present-day value. These adjustments were made in players_salaries_by_season_adj.sql, allowing for a direct comparison across different eras based on inflation and changes in the salary cap.
I’m interested in hearing what others have to think about my analysis. What other insights would have been useful for NBA general managers? Feel free to send me a message on LinkedIn to chat more!
My experience with the NBA Data Modeling Challenge was incredibly rewarding. Completing a full-scale analytics engineering project not only enhanced my portfolio but also gave me a valuable opportunity to showcase my skills to my network and potential employers. The competition was thrilling, and I was quite pleased with the prize!
Paradime is now hosting a new challenge, this time centered on the world of movies. Shifting from the basketball court to the big screen, this challenge promises to be an exciting exploration of movie datasets. If you're a data enthusiast with a love for films, this is your perfect opportunity to dive in, learn, and compete for prizes of $500, $1,000, and $1,500.
Sign up here to participate in the challenge and discover what insights you can uncover!