Notes on Recommendation: A Necessary Detour

Published on Jan 03, 2026

I recently wrote about building a film recommendation engine designed to encourage discovery beyond streaming catalogues. That project is still the destination.

However, I've decided to take a side-step: to build some smaller tools and experiments in the same domain before continuing with the original idea.

A Constrained Experiment

The working title for the first experiment is related.pictures. At its simplest, it's a deliberately constrained recommendation system: you pick a starting film, and it suggests a small number of related films based on shared cast and crew. No ratings. No genres. No attempt to infer who you are. Just a single creative artefact used as a point of entry.

This approach side-steps a common problem in recommender systems: Cold starts. In standard recommendation engines, a "cold start" occurs when the system lacks enough historical data to generate reliable suggestions for a brand-new user. By generating recommendations based solely on the chosen film, there's no need to build a viewer profile or guess what to show someone before knowing anything about their taste.

What makes related.pictures interesting to me is that it gives me a constrained environment in which to ask - and actually answer - questions about recommender systems before committing those answers to a larger, more ambitious product.

It lets me explore what happens when recommendations are driven by people rather than platforms. When similarity is derived from creative collaboration instead of catalogue availability. When discovery isn't bounded by what a service happens to be promoting this week. Crucially, it lets me do this without pretending I've already solved the hard problems.

The Ingestion Pipeline

To support that exploration, I've been building a small ingestion and processing pipeline: tooling that periodically gathers film data from public sources, ensures I retain the data only for as long as I'm legally allowed to, and produces a dataset that related.pictures can work with.

I initially used Junie to produce a proof of concept based on data from Wikidata using Python. The tool worked well enough to demonstrate that the idea was worth pursuing, so I'm now developing a production-ready version called `open-cinema-index`, which is available on GitHub. It will be pulling information from both Wikidata and TMDb initially, but eventually I plan on enriching films with data from other sources (like film certification bodies—e.g. BBFC).

The system runs quietly, recomputing recommendations overnight, and posts results to a simple API that the public-facing application can consume. This is not about real-time personalisation or infinite scroll. It's about determinism, traceability, and understanding why a particular recommendation exists at all.

I am building this tool to serve as the foundation for future experiments beyond related.pictures.

This is the first post in a series about the project. Some posts will be technical. Some will be reflective. Most will sit somewhere in between. The focus will be on the decisions that shape recommender systems long before machine learning enters the picture: data selection, similarity signals, update cadence, and the trade-offs that quietly determine what discovery even means.

None of this replaces the end-goal product. It exists so that when I return to it, I'll be doing so with fewer assumptions and better instincts.

A Series on Discovery

I'll be continuing this series by exploring how related.pictures behaves as a system. If you've worked on recommender systems, film data, or have thoughts on alternative approaches to discovery, I'd love to hear from you.

You can find the code and related tooling under the project-watchlist organisation on GitHub, including the current ingestion tool at https://github.com/project-watchlist/open-cinema-index.

Get in Touch

I'm currently looking for thoughts on how to optimize data fetching and handle ethical API scraping at scale - if you have experience with building resilient ingestion pipelines or navigating the etiquette of public data sources, I'd love to pick your brain. Get in touch via email