The Watchlist Project: Building a Recommendation Engine for the Unexpected
I've started work on something I've wanted for years but never found anywhere in the wild: a recommendation engine that doesn't just show me more of the same. I want something that nudges me toward films and TV shows I'd never normally pick. Not the popular stuff, not the algorithmically obvious stuff - something far more personal, and a little bit unexpected.
For now, I'm calling it the Watchlist Project.
The basic features are what you'd expect. I'll be logging what I've watched, what I'm watching, and what I plan to watch. That gives me the raw data to chew on. But that isn't the grand idea. The real goal is to build a system that can form a kind of "taste profile" from that data - not a demographic caricature ("you watched a sci-fi film, here are nine more sci-fi films") but a more nuanced sense of mood, tone, themes, pacing, emotional temperature, and the kinds of creative risks I tend to gravitate toward.
I want a system that can say:
"You gravitate toward character-driven stories with moral tension and a hint of cosmic unease. Try this obscure Norwegian gem from the 90s. Trust me."
The Core Idea: A Taste Profile That Actually Means Something
Instead of pigeonholing me into genres, the system will aim to capture things that matter more:
- Mood and emotional temperature
- Themes and ideas beneath the surface
- Story structure and pacing
- Creative risk and stylistic fingerprints
- Patterns that aren't obvious but feel intuitive when surfaced
The goal isn't "more sci-fi because you watched sci-fi."
The goal is "here's something you never would have clicked on, but will absolutely enjoy."
The Tech Behind the Scenes
I spend most of my professional life in PHP and web development, so naturally that's where the backbone of this project starts. But I'll use whatever languages or tools are appropriate for the task at hand.
I'll be breaking the system architecture into the following stages.
Data Ingestion Layer
Pulling raw data from the following sources:
- Manual entries (films watched, in progress, want to watch)
- External APIs (TMDB primarily)
- Optional metadata sources (Letterboxd CSV export, IMDb ratings, etc.)
The ingestion layer normalises:
- Titles
- IDs
- Genre tags
- Cast/crew information
- Keywords
- Release data
Think of it as a very opinionated ETL pipeline for metadata + user behaviour.
Feature Extraction
This is where the fun begins.
Potential features extracted per title:
- Thematic embeddings (using LLM/ML to generate a semantic vector for plot, tone, pacing)
- Crew signatures (directorial style, writer patterns)
- Narrative archetypes (hero's journey, anti-structure, character-driven drama)
- Genre/keyword weighting
- Temporal features (era, runtime, production region)
Potential features extracted per user:
- Taste-vector average (mean embedding of watched titles)
- Taste clusters (distinct "modes" of taste detected via clustering)
- Novelty tolerance (how far recommendations can deviate from known tastes)
- Serendipity bias (how often obscure or low-visibility titles should be surfaced)
- Diversity appetite (cross-genre, cross-era trends)
Recommendation Engine
This is the core algorithmic component. Some early approaches under consideration:
- Vector similarity search across thematic embeddings (Cosine similarity on content embeddings rather than genre labels.)
- Weighted matrix factorisation - Not for ratings - more for interest likelihood based on behavioural patterns.
- "Friction Distance" Metric - A custom measure that defines how far a recommendation can safely stray from the user's tastes while still feeling relevant.
- Clustering-based discovery - If I have distinct clusters (e.g., "slow sci-fi" and "British social realism"), the engine should occasionally bridge the gap with hybrid or unexpected picks.
This engine isn't trying to maximise watch-time. It aims for human-scale delight: the joy of discovering something that feels impossibly tailored yet surprising.
Presentation Layer
Initially a simple web app. Later:
- Personalised watchlist management
- A recommendation feed
- Title exploration pages
- Taste-profile visualisations
- A transparent "why this recommendation?" breakdown
- Maybe even a public API or data export
The interface is secondary to the idea, but still important. Discovery feels better when the UI feels like a curious companion rather than a corporate slot machine.
Why Build This Myself?
Because mainstream systems optimise for engagement, not discovery. Their goals aren't my goals.
I want recommendations that understand:
- Why I love Waiting for Bojangles and Finding Nemo in different ways
- That my interest in 90s sci-fi doesn't mean I want more nostalgia
- That my "comfort watches" don't define me
- That I crave strange, small, forgotten films just as much as the big blockbusters
The Watchlist Project is an attempt to build a tool that respects the weird knots of human taste.
Why I'm Sharing This
I want this blog to be more than a diary. By writing about the Watchlist Project publicly:
- I hope to attract collaborators or like-minded developers who might want to join the journey.
- I want to show my thought process and skills in a way that's concrete and project-based.
- And I want to document the evolution of a tech project from idea to execution, with all the mistakes, experiments, and breakthroughs along the way.
If you're curious about building something similar, enjoy following tech projects, or just love films as much as I do, I hope you follow along.
Stay tuned. There's a lot to build - and a lot to watch.