Every agent memory system I tried failed. Vector search, context stuffing, RAG — all broken in production. Here's what I built after 6 months of failures.
I keep a mental count. Eighty-seven times. That's how many times I've corrected my AI coding agent about our package manager. "We use pnpm." "Not npm." "pnpm, remember?"
It never remembers. It can't. Every session starts fresh. Every correction evaporates.
After six months of watching agents forget everything I taught them, I stopped blaming the models and started blaming the architecture. I built seven different memory systems before finding what actually works. This is the story of all seven — what failed, what survived, and the architecture I'm running in production today.
I call this The Amnesia Tax — the cumulative cost of agents that never learn. It's not just wasted tokens. It's wasted time, wasted quality, and the reason agents feel fragile instead of reliable.
The numbers: 60-70% of agent work is re-discovery of things already solved in prior sessions. That's not a guess — I instrumented my own workflows for a month.
Threw everything into ChromaDB, used cosine similarity for retrieval. Beautiful in demos, garbage in production. The agent would retrieve memories about "Python deployment" when asked about "deploying Python apps" — but miss the critical memory where the user said "actually, we switched to Go." Semantic similarity ≠ relevance.
Crammed session history into the system prompt. Hit token limits within days. Performance degraded as the context grew — more noise, less reasoning quality.
Better than raw vectors, but still missed corrections. "I hate sushi" on turn 4 wasn't semantically close to "recommend dinner" on turn 87. The forgetting curve didn't exist — everything had equal weight forever.
MEMORY.md files that persisted between sessions. Actually worked for small-scale use. But didn't scale — files grew bloated, no decay mechanism, no way to handle contradictions.
After building and discarding those four approaches, I landed on ECHO: Experience-Correction Hierarchical Organization. It combines ideas from neuroscience (Ebbinghaus forgetting curves, memory consolidation) with practical engineering (TF-IDF + vectors, deduplication, lifecycle management).
┌─────────────────────────────────┐
│ SKILL LAYER (compiled patterns)│ ← Proven, reusable, compressed
├─────────────────────────────────┤
│ BELIEF LAYER (Bayesian priors) │ ← "We use pnpm" (confidence: 0.97)
├─────────────────────────────────┤
│ PATTERN LAYER (observations) │ ← "User corrected npm→pnpm 3x"
├─────────────────────────────────┤
│ EXPERIENCE LAYER (raw events) │ ← Individual corrections, outcomes
└─────────────────────────────────┘
Experience → Pattern → Belief → Skill. Each layer is progressively more compressed and more confident.
# echo-memory.config.yaml
memory:
capture:
corrections: true # "No, use pnpm" → creates experience
outcomes: true # task succeeded/failed → creates experience
preferences: true # implicit signals from user behavior
consolidation:
schedule: "end-of-session" # like sleep consolidation
pattern_threshold: 3 # 3 similar experiences → pattern
belief_threshold: 5 # 5 confirming patterns → belief
skill_threshold: 10 # 10 successful applications → skill
decay:
enabled: true
half_life_hours: 168 # unreinforced memories fade in ~1 week
reinforcement_multiplier: 2 # each access doubles remaining life
retrieval:
methods: ["tfidf", "vector", "recency"] # hybrid, not vector-only
dedup_threshold: 0.85 # collapse similar memories
max_results: 10
❌ Every other agent memory system:
# Session 1
user: "We use pnpm, not npm"
agent: "Got it, using pnpm!"
# Session 2
agent: "Running npm install..." # Amnesia. Every. Time.
✅ ECHO architecture:
# Session 1: Correction captured
experience = {
"type": "correction",
"wrong": "npm",
"right": "pnpm",
"scope": "package_management",
"confidence": 0.8
}
# Session 2: Pattern recognized (2nd correction + initial)
pattern = {"rule": "always_use_pnpm", "observations": 2, "confidence": 0.9}
# Session 5: Belief compiled (never corrected again)
belief = {"fact": "pnpm_not_npm", "confidence": 0.97, "decay": "frozen"}
# Session 10+: Skill compiled
skill = {"name": "package_management", "rules": ["use pnpm", "lockfile: pnpm-lock.yaml"]}
Drop this into any agent project to start capturing corrections:
# memory_layer.py — minimal ECHO implementation
import json, time, hashlib
from pathlib import Path
class ECHOMemory:
def __init__(self, memory_dir="~/.agent-memory"):
self.dir = Path(memory_dir).expanduser()
self.dir.mkdir(exist_ok=True)
self.experiences = self._load("experiences.json", [])
self.beliefs = self._load("beliefs.json", {})
def capture_correction(self, wrong, right, scope):
exp = {
"type": "correction", "wrong": wrong, "right": right,
"scope": scope, "timestamp": time.time(),
"id": hashlib.md5(f"{wrong}:{right}".encode()).hexdigest()
}
self.experiences.append(exp)
self._maybe_promote(scope)
self._save("experiences.json", self.experiences)
def _maybe_promote(self, scope):
related = [e for e in self.experiences if e["scope"] == scope]
if len(related) >= 3:
# Auto-promote to belief
latest = related[-1]
self.beliefs[scope] = {
"fact": f"{latest['wrong']} → {latest['right']}",
"confidence": min(0.99, 0.7 + len(related) * 0.05),
"observations": len(related)
}
self._save("beliefs.json", self.beliefs)
def recall(self, query_scope):
return self.beliefs.get(query_scope)
I'm working on inter-agent memory sharing — where one agent's learned skills transfer to another. The hard part isn't the data format. It's trust: should Agent B trust Agent A's memories?
If you're building agents that run more than once, you need memory that learns. Not bigger context windows. Not better embeddings. A learning architecture.
The tools: agent-memory-system | bio-memory-layer | agent-memory-skill-compounding