Foobarcat

Baconometer

Published on July 21, 2025

Building the Baconometer

Introduction

The Baconometer is a playful yet technically interesting web app that answers the question: how connected are two actors through shared films? Inspired by the “Six Degrees of Kevin Bacon” game, the app computes the shortest path of collaboration between actors using movie credits. At the time of writing the baconometer is hosted [here] (https://baconometer.foobarcat.com). The baconometer is inspired by venerable oracle of bacon.

This post outlines how the Baconometer was designed and built - from database selection, and service development to deployment strategy.


The Problem

At the core, the Baconometer is about finding connections in a network. Given two actors, how are they linked through shared films and co-stars? This is a classic graph problem, where nodes are actors and films, and edges represent acting roles in those films.

Solving this efficiently for hundreds of thousands of actors and millions of credits means choosing the right data model and tools.


Data Gathering: TMDB Over IMDb

One of the earliest hurdles was data sourcing. IMDb data is comprehensive but comes with restrictive licensing terms that prohibit many forms of redistribution and use.

Instead, I opted for TMDB (The Movie Database), which offers a more permissive API and license. However, TMDB does not provide a bulk data dump like IMDb, which made ingestion less straightforward.

To solve this, I wrote a custom TMDB crawler that recursively traverses the API:

This strategy incrementally builds a usable actor–film–actor graph suitable for import into Neo4j. The crawler is available here: tmdb-crawler.

Design Decisions

Database

Most data problems can be solved with relational databases, indeed the Oracle of Bacon is implemented using a small Postgres database (source: github).

But the Baconometer is fundamentally a graph problem:

Feature / Aspect Neo4j (Graph DB) SQL (Relational DB)
Data Model Nodes and relationships (Actor, Film, ACTED_IN) Tables with foreign keys (actors, films, credits)
Querying connections Optimized for path traversal via MATCH and shortestPath Requires recursive CTEs or multiple joins
Shortest path query 1–2 lines in Cypher, efficient Complex recursive SQL or application-side logic
Performance on deep traversals Fast, native graph traversal Slower, needs multiple joins and careful indexing
Schema flexibility Schema-optional; easy to evolve Rigid schema; requires migrations for changes
Learning curve Graph model and Cypher require some learning SQL is widely known and supported
Tooling and ecosystem Neo4j Browser for graph visualisation Mature RDBMS tools, but not graph-native
Debugging data relationships Intuitive with visual graph tools Requires interpreting JOINs or ERDs manually
Storage efficiency Less compact, higher overhead Very compact and efficient for tabular data
Deployment complexity Requires Neo4j server setup Easier and more portable with PostgreSQL/MySQL
Maturity / portability Newer technology, evolving standards Mature, portable, and widely adopted

Choose Neo4j when:

Choose SQL when:

Conclusion

The Baconometer benefits from Neo4j’s graph-native capabilities, allowing it to perform efficient shortest-path queries with readable, maintainable code.

While the Oracle of Bacon demonstrates that SQL can technically solve the same problem, it requires significantly more effort in both design and query complexity. Neo4j offers a more elegant and scalable solution for this graph-based problem.

Neo4j provides native graph storage and a query language (Cypher) that’s optimized for path-finding. I saw this project as an opportunity to learn neo4j and to experiment with a graph database.

With Neo4j, the query to find the shortest path between two actors is as simple as:

MATCH path = shortestPath(
  (a:Actor {name: 'Kevin Bacon'})-[:ACTED_IN*]-(b:Actor {name: 'Some Other Actor'})
)
RETURN path

This made it a natural choice for the core database.

Server Software

Python was chosen for its ecosystem and familiarity. Flask, in particular, is:

Service Implementation

The Baconometer is a simple web app with a single page and a single API endpoint. The API endpoint is a Flask route that takes two actor names as input and returns the shortest path between them.

I added scripts that enable bootstrapping the neo4j database from either IMDB or TMDB datasets.

I added system tests to validate the API endpoint.

Hosting and Deployment

Initially, I considered managed Neo4j services, but they were expensive and restrictive. I wanted full control over the database and filesystem, especially during debugging and import.

After evaluating AWS, GCP, and some PaaS platforms, I settled on Hetzner Cloud:

Hetzner was much more affordable than AWS and co.

Deployment Setup

The deployment architecture is minimal and simple:

Deployment steps were handled via a small Bash script and manual provisioning for now. Future improvements could include:

Conclusion

The Baconometer is a great example of choosing the right tool for the problem. Graphs are a natural fit, and Neo4j’s expressiveness made the core logic simple. Flask and Python provided a productive backend environment, and Hetzner offered low-cost flexibility for deployment.

This project demonstrates that you can leverage neo4j to provide a performant and scalable solution for a graph problem in a reasonable time frame (this project was completed in a couple of weeks).

I am excited to think about what other problems neo4j can be targetted at.