Blockbuster Analytics: Paradime's Movie Data Modeling Challenge Highlights
Learn about our dbt™ Data Modeling Challenge - the movie edition, the winners, and the insights uncovered.
Parker Rogers
Jun 14, 2024
·
5
min read
Introduction
As the lights dim and the credits roll, it's time to reveal the results of the Movie Data Modeling Challenge! Join us as we celebrate our participants' incredible talents and delve into the groundbreaking insights they've uncovered.
About Paradime's Movie Data Modeling Challenge
At Paradime, we empower analytics engineers worldwide. The Movie Data Modeling Challenge offered a platform for experts to showcase their skills and compete for Amazon gift cards worth $500, $1,000, and $1,500.
Participants used extensive movie datasets to demonstrate their capabilities, uncover new insights, and highlight the role of analytics engineering in organizations.
Challenge Overview
Open to anyone with SQL and dbt™ experience, participants received access to Paradime for SQL & dbt™ development, Snowflake for compute and storage (pre-loaded with three historical Movie and TV datasets), Lightdash for data visualizations, and a GitHub repository with pre-configured models.
The goal: Create captivating analyses for Movie fans.
Participants had thirty days to build their projects, which were then independently scored by a panel of five judges, including myself.
Judging Criteria
Judging over 300 participants and 30 standout submissions, we evaluated them based on:
Value of insights (1-10): Relevance for movie fans
Complexity of insights (1-10): New dataset relationships and comprehensive conclusions
Quality of materials (1-10): Professional standard of code, visualizations, and insights
Integration of new data (1-10): Effective use of new, relevant data
After thorough scoring, the judges selected the top three winners!
Celebrating Our Top Three Participants
While these three participants secured the top spots, we extend heartfelt congratulations to everyone who participated. Their work was exceptional, and we'll showcase it in the next section.
First place: Isin Pesch, Data Analytics Engineer (Product) at Deel
By unanimous judges' approval, Isin's submission won the grand prize: a $1,500 Amazon gift card! Her project excelled in all four major categories: value, complexity, quality, and integration of new data. In addition to using SQL and dbt™, Isin used Python to unlock adjusted movie revenues for inflation, ensuring accurate comparisons of financial performance across different years.
Additionally, Isin developed a unique metric called Combined Movie Success, which aims to identify the greatest movies of all time by weighting various success metrics, including revenue, Rotten Tomatoes rating, IMDb votes, major awards won, and more.
Here are some of the top-notch insights she uncovered:
Next, let's look at the second-place winner!
Second place: Rasmus Sørensen, Lead Product Data Analyst at Lunar
Rasmus' project wowed the judges, earning him a $1,000 Amazon gift card! He provided valuable, high-quality, and technically complex insights. Notably, he performed a unique analysis of movie rating coorelation between two major movie databases: TMDB and IMDB.
Additionally, Rasmus' project precisely answers fascinating movie industry questions, like:
Third place: Imogen Ford, Digital Library Coordinator (Data) at Cambridge University Library
Securing third place in the Movie Data Modeling Challenge is impressive, especially since it was Imogen's first time using dbt™! Leveraging her 10+ years of experience in software development and data engineering, Imogen's submission showcases thought-provoking insights, such as:
Now that we've praised our winners, let's dive into some of the top insights from the challenge!
Top Insights from the Movie Data Modeling Challenge
In no particular order, here are insights that jumped off the judge's screens:
Top ten movies by combined success
Author: Isin Pesch (Data Analytics Engineer (Product), Deel)
Insight: According to Isin's ultimate combined success metric, "Batman - The Dark Knight" is the most successful movie of all time by more than one full unit.
Approach: Isin developed int_movies_mapping.sql to aggregate individual success metrics like revenue, Rotten Tomatoes rating, IMDb votes, and major awards. She then used int_combined_movie_success.sql to normalize these metrics and combine them into a single success rating.
Actor Age Distribution by Gender
Author: Imogen Ford (Digital Library Coordinator (Data), Cambridge University Library)
Insight: Men outnumber women in most age groups among both actors and directors, with a more significant gap among directors. Directors also tend to be older than actors on average. The most common age for female actors is 26, while for male actors it is 36. Notably, there are more female actors than male actors in the 16 to 27 age range, after which men dominate.
Approach: Using Python, Imogen classifies the birth year and gender of actors and directors. She then merges the two data sources in join_movie_people_and_wikidata_people.sql. Finally, she creates the analysis-ready model in movie_people.sql.
Top Director, Writer, and Actors from the top 200 highest grossing movies of all time
Author: Leticia Bueno (BI Developer, Tecsys Inc.)
Insight: It's no surprise that Steven Spielberg has directed the most movies among the top 200 highest-grossing films. Stan Lee leads as the writer of the most in this list, and Samuel L. Jackson, the should-be trademarker of the term "Mother F*****," has acted in the most of these blockbuster hits.
Approach: In int_omdb_tmdb_imdb_joined.sql, Leticia consolidates data from two major sources, removing duplicates to produce detailed movie information. In movie_people.sql, she categorizes relevant data about people involved in movies. Finally, she creates a "One Big Table" in movies.sql, simplifying the data architecture and enhancing query performance.
Top Money-Making Production Companies
Author: Anton Goncharuk (Principal Analytics Engineer, Hubspot)
Insight: Although this doesn't directly relate to Profit or Return On Investment (ROI), Warner Bros. Pictures appears to be a leader in the movie production industry based on gross revenue (box office), the total number of movies produced, and the number of Oscars their movies have received.
Approach: Anton developed int_tmdb_media.sql to consolidate movie data from TMDB. Next, he used media.sql to adjust financial figures for inflation, and produce a comprehensive, unified dataset of movie details.
Welcome to the Dark Side - Top Razzies Winning Films
Author: Santiago Orozco Rivillas (Data Engineer, Prediktia)
The Golden Raspberry Awards, or Razzies, recognize the worst in film each year.
Insight: Adam Sandler's "Jack and Jill" (2011) won eight Razzies, including Worst Picture and Worst Actor. Lindsay Lohan's "I Know Who Killed Me" (2007) and Paul Verhoeven's "Showgirls" (1995) each received seven. "Battlefield Earth" (2000) and "Cats" (2019) also earned multiple Razzies. Despite their negative reception, some have gained cult followings, highlighting the complex relationship between critics and audiences.
Approach: In stg_movie_awards.sql, Santiago extracts and filters movie award data to ensure each record has an associated movie. Next, in int_awards_categories_normalize.sql, he standardizes award categories. Finally, in most_awarded_movies.sql, he model aggregates nominations and wins for each movie, calculates win rates, and integrates budget information, providing a comprehensive view of award performances and financial metrics.
These insights are just a fraction of the remarkable work produced by our participants. For a deeper dive into their analyses, visit the paradime-dbt-movie-challenge repo and explore the diverse range of submissions!
Conclusion
That's a wrap for Paradime's Movie Data Modeling Challenge! Big cheers to everyone who participated and shared their analytics engineering expertise.
Paradime is constantly running data modeling challenges, and our next one starts in July! If you're interested in participating, pre-register below!