NBA Challenge Rewind: The Price of Performance - Exploring NBA Finances
Discover Spence's insights, data modeling best practices, and his experiences in Paradime's 'NBA Data Modeling Challenge.'
Chris Hughes
Jun 13, 2024
·
5
min read
Welcome to the "NBA Challenge Rewind" series 🙌
This blog series will showcase the “best of” submissions from Paradime’s NBA Data Modeling Challenges, highlighting the remarkable data professionals behind them.
If you’re unfamiliar with the NBA Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner’s announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "NBA Challenge Rewind" blog, you’ll discover:
Key NBA insights: Uncover the valuable insights participants derived from historical NBA datasets, revealing hidden stories within the game.
Analytics Engineering best practices: Learn about the participants' approach to project execution, from initial analysis to final insights, including their coding techniques (SQL, dbt™) and the innovative use of tools (Paradime, Snowflake, data visualization).
A Personal Touch: Get to know the motivations, backgrounds, and personal narratives of the analytics professionals who bring the NBA data to life.
A personal invitation to Paradime's next challenge: We're moving from the basketball court to the cinema—get your popcorn ready! 🍿
Let’s check our sixth installment, exploring Chris Hughes and his submission!
Chris’ path to the challenge
Hey There! My name is Chris Hughes, a data analytics expert specializing in marketing, product, and operations strategies for various companies. I recently moved to LA, where I'm running Hughes Analytics, a consulting service that supports businesses with their data engineering and analytics needs.
I learned about the NBA Data Modeling Challenge through a LinkedIn post, and I saw it as a perfect opportunity to add a comprehensive analytics engineering project to my portfolio and to learn new tools like Paradime. Going into the challenge, my goal was to win, and I'm thrilled to share that I placed second and won a $1,000 Amazon gift card!
In this blog, I'll share the journey of building my project, as well as the insights I uncovered!
Toolkit for success
To tackle this challenge, I utilized a mix of required and optional data tools:
Required Tools:
Snowflake for Data Warehousing and Computation: Having used Snowflake on both personal and client projects for years, I navigated this tool with ease, which was crucial for efficiently handling large datasets.
Paradime for dbt Development: This was my first time using Paradime, and I found the learning curve almost non-existent. Its VSCode-like interface makes it easy to manage and build dbt projects.
Sigma for Data Visualization: Having helped several clients build self-serve dashboards with Sigma, using it for this challenge was a no-brainer! It enabled me to visually present complex data insights clearly.
Optional Tools:
Python for Predictive Modeling: I integrated Python to develop a predictive model for player salaries, aiming to determine if players were under or over-performing relative to their compensation.
Now, let’s take a look at how I built my project!
Building my project
I began my project by identifying my primary audience: NBA General Managers. I brainstormed potential insights that could assist them in assembling the best teams possible, focusing on key areas such as:
Team Performance vs. Financials: Exploring the impact of team payroll on overall performance and strategies for GMs to maximize the value of their team payroll.
Player Performance vs. Financials: Investigating how individual salaries influence player performance and how GMs can identify optimal signing choices and pricing.
Typical Player Lifecycle: Analyzing changes in player performance throughout their careers to determine the most strategic points for GMs to sign players.
Evolution of the Game: Understanding how the NBA is evolving and identifying ways for GMs to stay ahead of the curve.
Next, I conducted a thorough review of the seven historical NBA datasets provided by Paradime. These datasets generally supported my insights, but to enhance the depth of my analysis, I integrated two additional datasets:
Salary Cap by Season: This dataset includes the maximum NBA salary cap for each season, along with the minimum and maximum salaries players could earn under the NBA’s collective bargaining agreement (CBA), sourced from Spotrac.
Inflation Data: Annual inflation rates based on the Consumer Price Index (CPI) from the Bureau of Labor Statistics, crucial for making equitable comparisons of player salary data across different seasons.
Once I had gathered all the necessary data, I began constructing my dbt models in Paradime and visualizing the insights. However, before we explore these insights, let's first delve into the inevitable and necessary challenges I encountered before submitting my project.
Navigating challenges
Data projects invariably come with challenges, and the NBA Data Modeling Challenge was no exception.
One of the primary challenges I faced was managing my time effectively. With a multitude of potential insights to explore, prioritization became crucial. At one point, I discovered I was missing data that could have significantly enhanced my project. However, given the constraints, I decided to omit this aspect, as creating the API connection, cleaning the data, and integrating it with other data sources would have required more time than I had available.
Additionally, I allocated a substantial amount of time to developing one of my predictive models, Player Performance vs Salary. While this analysis proved valuable, it was only one component of the broader project. If I could do it over again, I would have allocated more of that time to other insights.
Another major challenge was crafting a story. I didn’t want my project to be just a collection of disparate insights; instead, I aimed to weave a compelling story where each analysis built upon the previous one. Achieving this required multiple iterations of data analysis, refining the written insights, and drawing clear, insightful conclusions. To see how I addressed these challenges and crafted a compelling narrative, check out the README.md section of my submission.
Insights uncovered
Average PPG Age, Segmented by Player Status
See how points per game trends over the course of a player’s career.
Insight: Players tend to peak in terms of points per game (PPG) between the ages of 28 and 30. Superstar and Legendary players generally maintain more consistency throughout their careers. However, Star and Role players often experience a decline after reaching the age of 32.
Approach: To analyze these trends, I utilized stg_player_game_logs.sql to aggregate each player's average PPG into player_game_logs_agg.sql. I then categorized each player into their respective experience cohort—such as Legend, Role Player, etc.—based on their average PPG, which I defined in dim_player_info.sql.
Player Performance vs Salary
A look at the most overvalued players during the 2022-23 season.
Insight: John Wall emerged as the most overvalued player for the 2022-23 season. Despite earning a $47 million salary, the predictive models suggest his performance merited closer to $11 million. Interestingly, he was bought out of his contract and released at the end of the season, which aligns with our findings. Another high-profile case is Ben Simmons, who has been limited by injuries in recent years and also underperformed according to our metrics. Our analysis includes other notable players who may not provide the best value for their teams according to their current contracts.
Approach: I developed this analysis using dbt models fact_player_performance.sql, dim_player_salaries_by_season.sql. These models helped generate data that I fed into my predictive model salary_prediction_model.py which compares each player’s actual salary against their predicted salary based on performance metrics.
Highest Single Season Salaries (Adjusted for Inflation)
Insight: The analysis confirmed Michael Jordan's 1996 contract as the highest ever in NBA history, valued at $30 million at the time — unprecedented for any player. Adjusted for inflation and considering today's salary cap, this amount escalates to an astounding $280 million for the 2022-23 season. In contrast, the highest contract for an active player, belonging to Steph Curry, would be $58 million under similar adjustments, markedly less than Jordan's historic payout.
Approach: To conduct this analysis, I utilized stg_inflation_data.sql and stg_salary_cap_by_season.sql to adjust each player's historical salary to its present-day value. These adjustments were made in players_salaries_by_season_adj.sql, allowing for a direct comparison across different eras based on inflation and changes in the salary cap.
I’m interested in hearing what others have to think about my analysis. What other insights would have been useful for NBA general managers? Feel free to send me a message on LinkedIn to chat more!
Where to go from here
My experience with the NBA Data Modeling Challenge was incredibly rewarding. Completing a full-scale analytics engineering project not only enhanced my portfolio but also gave me a valuable opportunity to showcase my skills to my network and potential employers. The competition was thrilling, and I was quite pleased with the prize!
Paradime is now hosting a new challenge, this time centered on the world of movies. Shifting from the basketball court to the big screen, this challenge promises to be an exciting exploration of movie datasets. If you're a data enthusiast with a love for films, this is your perfect opportunity to dive in, learn, and compete for prizes of $500, $1,000, and $1,500.
Sign up here to participate in the challenge and discover what insights you can uncover!