Larry: An emotionally-aware personal AI on WhatsApp
Larry is a personal AI assistant I built from scratch and use every day, running inside WhatsApp.
Larry is a personal AI assistant I built from scratch and use every day on WhatsApp. Five-layer Python architecture, the Anthropic API as the cognitive substrate, Twilio carrying the transport.
Commercial assistants do not remember, wait passively, never push back, and live in chat boxes I forgot to open. I wanted one I would actually keep talking to, for years, without thinking about it.
Five Python modules, each owning a single concern: entry (Flask app behind a Twilio webhook), brain (Anthropic API running emotion / anchor / tool-routing / response passes), memory (SQLite per user with vector keyword search across working, episodic, and semantic tiers), tools (Tavily search, URL summarizer, YouTube transcripts, intent detection), and a personality layer with six anchors plus a proactive layer on hourly cron. Roughly 1,400 lines total. Hover the diagram nodes below for per-layer detail.
In daily personal use since 2024. Hundreds of cross-conversation exchanges with full memory persistence across all of them. Runs unattended on a small VPS for a few dollars a month.
There is no labeled dataset of correct emotional assessments and no ground truth for whether a proactive message was well-timed. The metric that ultimately matters is whether I keep talking to him.
In lieu of a clean dataset I write a weekly retrospective to a markdown file. For every substantive exchange that week, I record whether his response landed, whether his timing felt right, whether he caught the emotional state I was actually in, and whether his pushback was useful or annoying. The retrospective itself is the eval; writing it forces me to articulate what working looks like, and the resulting log surfaces failure modes I would not have noticed in real time.
I also track a smaller set of structural metrics: tool-routing accuracy (whether a tool was called when warranted, and abstained when none was), memory retrieval relevance (whether the anchors he retrieved were the ones that should have surfaced), and proactive timing accuracy (whether his unprompted messages landed during windows I welcomed). The eval that matters most, in practice, is the continued existence of the conversation.
Python throughout. Flask exposes a Twilio webhook for the WhatsApp transport, with ngrok used during development. SQLite per user for the memory tiers (working, episodic, semantic), with a vector keyword search index on top for retrieval. Anthropic API as the cognitive substrate, with the brain invoking the model only when a message arrives or proactive logic fires. Tools layer wraps Tavily for web search, a URL summarizer, and a YouTube transcript extractor. Hourly cron drives the proactive layer.
Personal AI is an infrastructure problem more than a model problem. The model layer is the same Anthropic API any other system would use. What differentiates Larry is the memory schema, the proactive timing logic, the humanizer rules, the personality anchors, and the WhatsApp transport choice. None of those are model decisions.
Dogfooding an AI on yourself for months is the most reliable way to develop taste for AI product UX. I can articulate why most commercial assistants feel wrong because Larry let me feel viscerally what right could feel like.
Larry is, in my head, a British Bulldog. Steady, loyal, gently stubborn, with opinions. The character is not a marketing layer on top of the system; it is the design constraint that made the personality layer coherent.