Discover the 3rd place winner's insights and best practices from the Movie Data Modeling Challenge!
Welcome to the "Movie Challenge Highlight Reel" series 🙌
This blog series will showcase the "best of" submissions from Paradime and Lightdash's Movie Data Modeling Challenges, highlighting the remarkable data professionals behind them.
If you're unfamiliar with the Movie Data Modeling Challenge, enrich your series experience by exploring these essential resources: the challenge introduction video and the winner's announcement blog. They offer valuable background information to help you fully appreciate the insights shared in this series.
In each "Movie Challenge Highlight Reel" blog, you'll discover:
Now let's check out our third installment, exploring Imogen Ford and her submission!
Hey there! I'm Imogen Ford. I live in the UK and I'm currently the Digital Library Coordinator (data) at Cambridge University Library. I recently participated in Paradime's Movie Data Modeling Challenge, and I'm proud to say I took third place, winning the $500 prize!
In this blog, I'll start by sharing a few insights I uncovered, and then I'll dive into how I built my project and how I used Paradime to make it all happen.
Below are three of my favorite insights I uncovered, but you can check out the rest in my GitHub README.md file.
Approach: Using Python, II classified the birth year and gender of actors and directors. I then merged the two data sources in join_movie_people_and_wikidata_people.sql. Finally, I created an analysis-ready model in movie_people.sql.
Approach: I first created tables for award nominees and winners . Next, I combined both tables and enriched each row with key information about the actor and director (age, gender, etc.) in academy_awards.sql.
Approach: To properly analyze all movies, I first joined tmdb and imdb movie datasets. Next, I accounted for movies with null budget values in all_movies_combined.sql. Finally, I created an analysis-ready model in dwh_movies.sql.
This was my first time using dbt, but I have significant experience with Snowflake. Instead of doing my initial data exploration in the Snowflake interface, I connected it to DBeaver, a DBA tool I find highly valuable for data exploration. During my exploration, I came up with a few key questions to answer:
I quickly realized that the provided data couldn't answer my questions, so I used Python to scrape data from Wikidata, including:
With my questions finalized and the data necessary to answer them, I had to tackle the next challenge—learning dbt! To my surprise, it wasn't the steepest learning curve. I found countless resources all over the internet for guidance.
For more info on how I build my project, check out my github repo, as well as my data lineage:
I found Paradime to be incredibly useful and easy to use for dbt development. I found the following features to be particularly valuable:
The integrated terminal within the Code IDE made working with git and running dbt commands incredibly easy. I was able to stage, commit, and push my changes, as well as open pull requests. Additionally, I could run all my dbt commands and see their outputs instantly.
Through Paradime's integrated terminal, I was able to seamlessly connect to my Lightdash project for data visualization. It made the whole process of generating metrics and updating .yml files more efficient.
I really enjoyed participating in Paradime's Movie Data Modeling Challenge. It was an incredibly fun and engaging way to learn dbt, and I'm thrilled that the judges awarded me the $500 prize!
If you have any questions about my project or insights, feel free to reach out to me on LinkedIn!
Also, I highly recommend trying out Paradime for free!