Movie Challenge Rewind: Gender and Age Trends in Hollywood

Discover the 3rd place winner's insights and best practices from the Movie Data Modeling Challenge!

July 11, 2024
A reading icon
4
 min read
Movie Challenge Rewind: Gender and Age Trends in Hollywood

Welcome to the "Movie Challenge Highlight Reel" series 🙌

This blog series will showcase the "best of" submissions from Paradime and Lightdash's Movie Data Modeling Challenges, highlighting the remarkable data professionals behind them.

If you're unfamiliar with the Movie Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner's announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.

In each "Movie Challenge Highlight Reel" blog, you'll discover:

  • Key Movie insights: Uncover the valuable insights participants derived from historical Movie datasets, revealing scroll-stopping insights about movies, actors, directors, production companies, finances, and more.
  • Analytics Engineering best practices: Learn about the participants' approach to project execution, from initial analysis to final insights, including their coding techniques (SQL, dbt™) in Paradime.

Now let's check out our third installment, exploring Imogen Ford and her submission!

Introduction

Hey there! I'm Imogen Ford. I live in the UK and I'm currently the Digital Library Coordinator (data) at Cambridge University Library. I recently participated in Paradime's Movie Data Modeling Challenge, and I'm proud to say I took third place, winning the $500 prize!

In this blog, I'll start by sharing a few insights I uncovered, and then I'll dive into how I built my project and how I used Paradime to make it all happen.

Insights Uncovered

Below are three of my favorite insights I uncovered, but you can check out the rest in my GitHub README.md file.

Actor Age and Gender Distribution

Approach: Using Python, II classified the birth year and gender of actors and directors. I then merged the two data sources in join_movie_people_and_wikidata_people.sql. Finally, I created an analysis-ready model in movie_people.sql.

Actor & Directors with most award nominations

Approach: I first created tables for award nominees and winners . Next, I combined both tables and enriched each row with key information about the actor and director (age, gender, etc.) in academy_awards.sql.

Do Academy Award winners have higher budgets than nominees?

Approach: To properly analyze all movies, I first joined tmdb and imdb movie datasets. Next, I accounted for movies with null budget values in all_movies_combined.sql. Finally, I created an analysis-ready model in dwh_movies.sql.

Building my project

This was my first time using dbt, but I have significant experience with Snowflake. Instead of doing my initial data exploration in the Snowflake interface, I connected it to DBeaver, a DBA tool I find highly valuable for data exploration. During my exploration, I came up with a few key questions to answer:

  • What are the gender and age distributions of actors? What about directors?
  • Does movie budget impact award nominations?
  • Which countries produce the most films?

I quickly realized that the provided data couldn't answer my questions, so I used Python to scrape data from Wikidata, including:

With my questions finalized and the data necessary to answer them, I had to tackle the next challenge—learning dbt! To my surprise, it wasn't the steepest learning curve. I found countless resources all over the internet for guidance.

For more info on how I build my project, check out my github repo, as well as my data lineage

How I used Paradime

I found Paradime to be incredibly useful and easy to use for dbt development. I found the following features to be particularly valuable:

Integrated Terminal

The integrated terminal within the Code IDE made working with git and running dbt commands incredibly easy. I was able to stage, commit, and push my changes, as well as open pull requests. Additionally, I could run all my dbt commands and see their outputs instantly.

Lightdash integration

Through Paradime's integrated terminal, I was able to seamlessly connect to my Lightdash project for data visualization. It made the whole process of generating metrics and updating .yml files more efficient.

Wrap Up

I really enjoyed participating in Paradime's Movie Data Modeling Challenge. It was an incredibly fun and engaging way to learn dbt, and I'm thrilled that the judges awarded me the $500 prize!

If you have any questions about my project or insights, feel free to reach out to me on LinkedIn!

Also, I highly recommend trying out Paradime for free!

Schedule a call with the team and learn how to maximize the impact of analytics

Interested to learn more?
Try out the free 14-days trial
Close Cookie Preference Manager
By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage and assist in our marketing efforts. More info
Strictly Necessary (Always Active)
Cookies required to enable basic website functionality.
Oops! Something went wrong while submitting the form.