A journey into why personal AI is harder than it looks

Aug 23, 2025

I wanted to build my daemon.

Not just another chatbot, but something closer to a cognitive companion that knows me completely while maintaining its own perspective. Something that understands my thinking patterns, remembers my intellectual journey, and challenges me from a position of deep familiarity with who I am.

The concept felt achievable. I’d been self-hosting services for years, running things such as media servers, home automation, and other tinkering projects. Adding AI infrastructure seemed like a natural extension. Privacy wasn’t just a nice-to-have - it was fundamental: to build something that truly knows you requires exposing your most intimate cognitive patterns. That data couldn’t live on someone else’s servers.

My initial assumption was elegantly simple: build a smart local database of “everything about me,” give a good language model access to it, and voilà - my own daemon. I had already been experimenting with Claude and ChatGPT, learning how conversations build context beyond just the immediate prompt. If I could construct the right context locally, I could even randomize API calls between different providers to obfuscate any single conversation while maintaining continuity.

As someone who is designing business systems and ERP implementations as a living, I approached this like any other integration project: understand the components, map the data flows, build the connections. What could go wrong?

Everything, it turned out. But not in the ways I expected.

The initial enlightenment

My journey actually began a year ago with a much simpler experiment. I built a Telegram bot that would take my prompts and send them simultaneously to both OpenAI and Anthropic’s APIs, displaying both responses side by side. Originally, I just wanted to compare outputs, to see how different models handled the same questions.

But over months of playing with it but also actively using Claude and ChatGPT, I discovered something more fundamental: the magic was not (only) in the models themselves, but in how context accumulated. Each response built on previous exchanges. The AI developed a sense of my communication style, my interests, the kinds of problems I was trying to solve. What made these conversations compelling was not the intelligence of any single response, but the continuity that emerged over time.

This was my first real understanding of what “memory” meant for AI - not just storing facts, but building a shared conversational context that made each interaction richer than the last. It planted the seed of a bigger idea: what if, instead of just conversation memory, an AI could have complete knowledge of who I am?

Building the foundation

Fast forward to a few months ago. Armed with insights from the Telegram experiment, I set out to build something more ambitious. Using n8n as an orchestration platform on my local machine, I connected three different AI providers: Anthropic’s Claude, OpenAI’s ChatGPT, and local models through Ollama running on my PC.

The architecture was clean. Every conversation, regardless of which model handled it, was stored in a PostgreSQL database with a consistent schema. I could even start a conversation with ChatGPT on Monday, continue it with Claude on Wednesday, and reference both while talking to my local Llama model on Friday. The context flowed seamlessly between them.

Unlike the ephemeral conversations you have with web interfaces, my system maintained perfect memory. Every nuance, every tangent, every half-formed thought was preserved and accessible. I could pick up conversations weeks later exactly where I’d left off. More importantly, I could ask one model about something I’d discussed with another - “Remember when we talked about that database architecture problem?” actually meant something, even across different AI providers. The user experience was still very rough, but it was working.

The technical implementation was surprisingly straightforward. A few weeks of evening tinkering, and I had a working multi-model chat system with persistent memory. It felt like I was 70% of the way to my goal. The foundation was solid - now I just needed to feed it knowledge about me.

The knowledge wall

That’s when I hit the first real conceptual wall. As a next step, I wanted to feed it with some books I have read. Why? Because I had them in digital form, around 500 books sitting in ePub format on my hard drive - books I had read, annotated, thought about. Books that meant something to me. Books that had shaped my thinking over decades. Surely these should be part of my daemon’s knowledge?

But what does that actually mean?

Do I want my daemon to quote these books back to me? No, I can search for quotes myself. Do I want it to “understand” them? Yes, but understanding a book as me is different from understanding it objectively. And this also applies to fiction: when I read “The Martian” by Andy Weir, it resonates because of my physics background, but also illustrates superbly on how we tackle big problems at work, breaking them down into small swallowable chunks; when I read the “Murderbot” series by Martha Wells, it sheds a very different light on this whole daemon project; when I read the “Game of Thrones” series by George R. R. Martin, I am more interested by the power plays and the way some of the characters evolve than by the depiction of epic battles.

This led to a maddening catch-22: to extract personally meaningful knowledge from these books, the system processing them would need to already understand me - my experiences, my intellectual history, how my mind makes connections. But it could not understand me without having processed the content that shaped me. It’s a bootstrap problem with no clear solution.

I explored “chemical reduction” - using AI to distill books down to their intellectual essence. But whose essence? The model doing the reduction would impose its own understanding, not mine. Even if I could fine-tune a model on my own writing first, it would still be interpreting the books through its training, not through my lived experience.

Then there was the cost reality check. Processing 500 books through my local machine would take weeks, using cloud APIs would run thousands of dollars. Not impossible, but certainly not something I could just experiment with and see if the result is useful. The expense forced a value question I could not answer: what would I actually gain from having my daemon “know” these books?

The scale wall

The book problem was just the beginning. My digital footprint sprawls across dozens of platforms - forum posts spanning two decades, blog articles, work documents, code repositories, email archives. Getting this data proved surprisingly difficult.

Forum scraping became a weeks-long project in itself. Different platforms, different authentication methods, different ways of hiding your own content from you. Even when successful, I was left with thousands of posts that needed processing. Should my daemon know about my heated debate about French politics from the late 2000’s? What about my evolving thoughts on freeride skiing across hundreds of gear discussions?

Local processing hit hard limits. My commercial GPU could run respectable models, but processing hundreds of thousands of documents would take months. Cloud APIs could handle the scale but at prohibitive cost - and that assumed I was comfortable sending my entire intellectual history to OpenAI or Anthropic’s servers.

The more I worked on the problem, the more I realized the scale was not just about compute power or API costs. It was about the fundamental question of what constitutes “me” in data terms. Every forum post, every email, every document is a snapshot of who I was at that moment. How do you synthesize thousands of these snapshots into a coherent understanding of a person?

The privacy paradox

Even if I solved the knowledge problem, a deeper issue emerged. My original vision assumed I could maintain privacy by keeping my knowledge local and only sending anonymized prompts to cloud APIs. But that’s not how language models work.

To use my carefully curated local knowledge, I would need to send substantial context with each query. When I have a conversation about relationships, I would include context coming from my e-mails dating back to my divorce. When I have a conversation about my career, I would include excerpts from my LinkedIn profile, but also e-mails from past job applications and feedback from recruiters and work colleagues. Even without explicitly identifying information, the context itself becomes a fingerprint. The very patterns of thought I wanted my daemon to understand are what make me identifiable.

The alternative - running everything locally - meant accepting dramatically reduced capabilities. My local models, even with above average computing power, could not match Claude or GPT’s reasoning abilities. The daemon I could build privately would be sovereign but stupid. The daemon that could genuinely augment my thinking would necessarily expose my cognitive patterns to external services.

This was not a technical problem to solve but a fundamental architectural trade-off. Privacy and capability exist in tension, not harmony.

The context window Problem

Even with perfect knowledge and infinite compute, another limitation emerged: context windows. Current language models can only “attend to” a limited amount of information at once. Even Claude, with its impressive 200K token context, can’t hold even a fraction of someone’s intellectual history in active memory.

The standard solution is RAG - Retrieval Augmented Generation - where you search for relevant knowledge and inject it into the context. But relevance is relative. When I am thinking about a system design problem, what’s relevant might be a professional conference I attended fifteen years ago, a failed project from five years ago, and an avalanche course that taught me about risk assessment. No embedding similarity search would connect these disparate experiences, yet they are all part of how I approach the problem.

Fine-tuning seemed promising - train a model specifically on my content. But fine-tuning does not create memory, it creates tendencies. The model might learn to respond in my style, but it would not remember our conversations or know about my specific projects.

Tools like Letta (formerly MemGPT) are trying to solve the memory problem by creating sophisticated memory management systems. But they are still trapped in the same paradigm: a base model plus context. As memory grows, it gets compressed, summarized, abstracted. The nuances that make me “me” get lost in the compression. They are solving for infinite memory of conversations, and that has valuem, but not for genuine understanding of a person.

Recognizing the pattern

After months of building, experimenting, and hitting walls, a pattern emerged. Every solution I explored - whether technical, architectural, or conceptual - revealed the same fundamental tension. The vision of a personal AI daemon requires:

Complete knowledge of an individual (privacy-breaking if external)
Sophisticated reasoning capabilities (currently requires massive models)
Perfect memory across time (impossible with current architectures)
Personal meaning extraction (requires already knowing the person)

Current AI architecture - base models plus context - might be fundamentally unsuited for personal AI. We are trying to build daemons using tools designed for generic question-answering.

The journey continues

The gap between what I built and what I envisioned is not about execution quality or technical competence. It is about fundamental architectural limitations in how we approach personal AI. Maybe personal AI requires training individual models from scratch for each person. Maybe it requires entirely new architectures that don’t separate model from memory. Maybe the path to daemons is not through incremental improvements to current systems but through fundamentally different approaches we haven’t imagined yet.

The commercial AI labs are racing toward AGI, pouring billions into making models marginally smarter. But what if we already have enough intelligence? What if the barrier to personal AI is not model capability but architecture? What if we are optimizing for the wrong metrics entirely?

We are trying to build a close friendship by hiring increasingly knowledgeable consultants. This is not what I am after. The daemon I wanted to build is still worth building. We just might need to reconsider everything about how we build it.