NBA Challenge Rewind: From Stats to Stories
Discover Katie's insights, data modeling best practices, and her experiences in Paradime's 'NBA Data Modeling Challenge.'
Katie Shaffer
Jun 13, 2024
·
5
min read
Welcome to the "NBA Challenge Rewind" series 🙌
This blog series will showcase the “best of” submissions from Paradime’s NBA Data Modeling Challenges, highlighting the remarkable data professionals behind them.
If you’re unfamiliar with the NBA Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner’s announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "NBA Challenge Rewind" blog, you’ll discover:
Key NBA insights: Uncover the valuable insights participants derived from historical NBA datasets, revealing hidden stories within the game.
Analytics Engineering best practices: Learn about the participants' approach to project execution, from initial analysis to final insights, including their coding techniques (SQL, dbt™) and the innovative use of tools (Paradime, Snowflake, data visualization).
A Personal Touch: Get to know the motivations, backgrounds, and personal narratives of the analytics professionals who bring the NBA data to life.
A personal invitation to Paradime's next challenge: We're moving from the basketball court to the cinema—get your popcorn ready! 🍿
Let’s kick off with our first installment, exploring Katie Shaffer and her submission!
Katie's path to the challenge
Hi, I’m Katie Shaffer, a Lead Data Analyst at Wellthy. For 13 years, I’ve been immersed in the data space, with a focus on healthcare.
I stumbled upon the NBA Data Modeling Challenge through a LinkedIn post, sparking my interest as a unique blend of my professional skills with a personal interest in basketball. As part of a small data team, I occasionally delve into Analytics Engineering, which made this challenge not only a perfect opportunity to keep my dbt™ skills sharp, but also an ideal project to enhance my portfolio.
After registering to join the challenge, I wasn’t sure exactly what to expect; I had never participated in a challenge like this before. I expected it to be tough (and that it was!) but also fun. It helped me sharpen my analytics engineering expertise and introduced me to new tools!
Toolkit for Success
My exploration was powered by a familiar and new set of tools:
Paradime for SQL & dbt™ development: This was my first time using Paradime, which I found both intuitive and easy to use in conjunction with other tools.
Snowflake for computing and storage: I use Snowflake in my current role, so this was quite familiar.
Tableau Public and Datawrapper for data visualizations: This was my first time using Datawrapper, and I found it incredibly useful for data storytelling.
Navigating data insights and challenges
I started by reviewing the data and available documentation here. I ran exploratory queries in Snowflake to understand the structure and content of the data. From there, I utilized SQL and dbt™ in Paradime for modeling. Then, I finished by creating visualizations in Tableau and Datawrapper.
Of course, there were several challenges throughout the process. Several insights that I thought would be interesting turned out to be dead ends, or simply weren’t significant enough to compete for the prizes.
The amount of data was initially overwhelming. However, I soon realized that picking a direction and focusing on a subset of the data made it more manageable. The subsets of data I focused on were:
COMMON_PLAYER_INFO - Contextual information per player from the 1946-2023 NBA seasons.
PLAYER_GAME_LOGS - Statistics per player per game from the 1946-2023 NBA seasons.
PLAYER_SALARIES_BY_SEASON - Annual salary per player, per year from the 1990-2023 NBA seasons.
TEAM_SPEND_BY_SEASON - Annual spend per team, per year from the 1990-2023 NBA season.
TEAM_STATS_BY_SEASON - Annual statistics for each team during the 1950-2023 NBA season.
Now, let’s dive into my insights!
Insights unveiled
Largest record improvement in a single season
A look at the biggest single season turnarounds in NBA history, by comparing a team's season record to the prior season.
Insight: The 2007-08 Celtics had the biggest turnaround in NBA history. They won 26 games in 2006-07 and 66 the following season, including the league championship. The 1999-00 Los Angeles Lakers pulled off a similar feat to win the league championship, improving by 36 wins over the prior season.
Approach:
Started with team_stats_by_season.sql to gather regular season statistics.
Applied the LAG() function within teams_season.sql to calculate each team's winning percentage and compare it to the previous year's performance.
Employed the DENSE_RANK() function in agg_team_ranks_by_season.sql to rank teams based on the magnitude of their improvement from one season to the next.
Playoff team winning percentages over time
Visualization of the minimum and maximum regular season winning percentage for playoff teams, as well as the winning percentage of the team that won the league title.
Insight: The 1994-95 Houston Rockets stood out as champions despite their lower winning percentage, revealing the unpredictability of playoff success.
Approach:
Utilized team_stats_by_season.sql to identify NBA champions for each season and their respective regular season winning percentages.
In teams_season.sql, analyzed winning percentages to determine regular season performance.
Implemented the DENSE_RANK() function in agg_team_ranks_by_season.sql to identify which champion had the lowest regular-season winning percentage, illustrating the unpredictable nature of playoff success.
Where to go from here
After tackling the NBA Data Modeling Challenge, I’ve come away with more than just sharpened skills—it’s been a genuinely fun ride. If you’re curious about any part of my process or have suggestions, I’d be thrilled to connect on LinkedIn.
Looking forward, Paradime’s got something exciting on the horizon: a challenge centered around movies data in April. It’s a shift from basketball to the big screen, and honestly, I can’t wait to see what we can uncover within movie datasets. There’s something special about diving into the numbers behind the stories we love on screen. So, if you’ve got a knack for data and love movies, this is your chance to explore, learn, and compete for the $500, $1,000, and $1,500 prizes!