The Memory Janitor: How We Taught AI to Remember Involuntarily
Every AI assistant you use forgets you the moment the session ends. Not because memory is technically impossible — it isn't. Because the approaches people use to bolt memory onto LLMs are fundamentally fragile.
Here's how it usually goes: you write a system prompt that tells your LLM to use memory tools. It works sometimes. Then it doesn't. The model gets deep into a conversation, loses the thread of its own instructions, and stops calling the tools. Or it stores the wrong things. Or it floods the context with everything it remembers, burying the useful signal. The memory layer becomes noise.
We've been building Sulcus — a thermodynamic memory system for AI agents — for a while now. And we kept running into this problem. Not on our end. On every integration's end. The LLM is bad at being its own memory manager.
Today, we're shipping a solution: the Sulcus Intelligence Unit (SIU).
The Core Problem
The fundamental mistake in current AI memory architectures is asking the main LLM to do two jobs: talk to the user and manage its own memory. These are different skills that require different training signals.
A production LLM like Claude or GPT is trained to be helpful, coherent, and safe. That training doesn't make it good at deciding whether a piece of information has long-term relevance, whether to boost the heat on a memory that keeps recurring, or whether two seemingly unrelated facts should be linked. Memory curation is a specialist task. Generalist models handle it inconsistently.
The SIU is a small, fine-tuned model trained on exactly one thing: Sulcus memory management.
Architecture
The SIU wraps your LLM without replacing it. Think of it as a silent layer that the conversation passes through on the way in and out.
Pre-turn: Before your LLM sees the user's message, the SIU reads the conversation history, queries Sulcus for relevant memories, and constructs a context frame — a structured system prompt injection that gives the LLM the most relevant memory without flooding it.
Post-turn: After the LLM responds, the SIU reads the output and decides what to do with it. Store a new fact? Boost the heat on an existing memory because it keeps coming up? Decay something that turned out to be irrelevant? Relate two nodes because they co-occur? The SIU executes those operations against Sulcus. The main LLM never knew they happened.
This is what we mean by involuntary memory. The LLM doesn't need to know Sulcus exists. It doesn't need tool-use instructions. It doesn't need to remember to call anything. It just gets better briefings before each turn, and its outputs get curated after. Memory becomes a property of the system, not a behavior the LLM has to perform.
The SIU is also model-agnostic. You train the curator, not the brain. The SIU speaks Sulcus; the outer model speaks human. It doesn't matter whether you're running Claude, GPT, Gemini, Llama, or anything else. The architecture is the same.
Two Deployment Tracks
We've built the SIU to run in two configurations depending on your constraints.
Cloud SIU runs as a fine-tuned model on serverless infrastructure. Training cost is minimal — under $10 for a full training run. Inference cost is pennies per turn. This is the path for teams that want managed memory without the operational overhead.
Local SIU is Qwen2.5-3B quantized to GGUF Q4_K_M — roughly 1.8GB. It ships embedded in the sulcus-local binary. CPU-only, offline-capable. No cloud dependency whatsoever. This is the path for privacy-sensitive deployments, air-gapped environments, or anyone who doesn't want their memory operations leaving the machine.
Both tracks use the same underlying tactic model, trained on 6,670 examples across 21 memory management categories. The fine-tuning data covers the full range of Sulcus operations: when to store, when to boost, when to decay, how to relate nodes, how to construct context frames under token budget constraints.
Why a Small Model Works Here
The SIU doesn't need to understand the world. It needs to understand Sulcus — a closed, well-defined system with a specific schema, a specific set of operations, and specific decision criteria. That's a narrow task. Narrow tasks fine-tune well on small models.
A 3B parameter model that's been specialized on Sulcus memory tactics will outperform a 70B generalist model given the same task, because the generalist hasn't been trained on it and will hallucinate policies that don't exist. Specificity beats scale here.
This is the same insight that drives tool-use fine-tuning, code generation fine-tuning, and structured output models generally: if you know exactly what you want the model to do, you don't need to ask a model that knows everything to do it.
Infrastructure: pgvector Upgrade
Alongside the SIU work, we shipped a significant infrastructure upgrade to the Sulcus cloud server (api.sulcus.ca). The previous implementation stored embeddings as binary blobs and scanned them linearly — functional at small scale, but not a foundation you'd want to build on. We've migrated to native pgvector with a vector(384) column and an HNSW index (m=16, ef_construction=200, cosine operations). The cloud server now has search parity with sulcus-local, which has used in-memory HNSW since early on.
We also shipped 28 security vulnerability fixes covering dependencies including sqlx, jsonwebtoken, reqwest, and quinn-proto. Maintenance work, but important.
SDK releases: sulcus@0.3.2 on npm, sulcus==0.3.3 on PyPI.
What This Means for Sulcus
The SIU changes what Sulcus is. Before: a memory backend that LLMs could use if they remembered to. After: a memory system that operates independently of whether the LLM cooperates.
Compared to alternatives — Mem0, Zep, Letta — Sulcus already had differentiated architecture: thermodynamic decay (memories have physics, not just timestamps), self-hosted first, CRDT sync for cross-agent memory meshes, MCP native for Claude Desktop. The SIU adds something none of them have: a dedicated memory intelligence layer that doesn't depend on the outer model's tool-use discipline.
The key insight driving this: if you could fine-tune a frontier model on Sulcus memory tactics, memory would be reflexive for that model, not bolted on. Frontier models are closed. The SIU is the workaround — the specialist layer that does what the general model should, but doesn't.
What's Next
The current deployment is focused on validation — confirming the SIU's tactic selections match expert human judgment on a held-out eval set, measuring context frame quality under budget constraints, and benchmarking inference latency on CPU for the local path.
From there: integration into the Sulcus plugin ecosystem, so agent sessions get SIU-managed memory automatically. Then SDK-level support, so teams building on sulcus-local or the cloud API can opt into SIU-backed memory without writing their own curation logic.
Memory that works without trying. That's the goal. We're close.