MEMORY.md Is the Logbook You Haven't Been Keeping

You know the one. An API somewhere in your codebase uses x-api-auth instead of x-api-key or Authorization. It’s been that way for three years. You wrote the integration. You documented it in a Notion page that has since been archived. You’ve now re-asked yourself “wait, is it x-api-auth or X-Auth-Token?” enough times that you’d be embarrassed to count.

That header is real. It’s in the codebase. It’s just not in your head, and the codebase doesn’t surface it where you need it. The information exists. The connection between you and the information is what fails.

This post is about the file where that information goes to live. And about why the AI engineer — the role I described in yesterday’s post — needs this file to do their job.

Every codebase has these

The x-api-auth header is one example. Pick any codebase you’ve worked in for more than six months and the list goes long.

The helper at src/utils/normalize-phone.ts that everyone uses but nobody documented. It handles +1 country codes by default, but only if the input is at least seven characters. You only know this because you wrote it.
The if (env !== 'prod') guard in payments.service.ts that came from a 2023 incident nobody on the current team was around for. The guard is doing real work. The reason is in someone’s head, or in an archived postmortem.
The fact that the events table has a published_at column you should always query despite the created_at column being what the schema docs reference. The data team migrated quietly two quarters ago. Everyone who needs to know has been told individually, in DMs, six times.
The naming convention where any background job ending in _sweep runs hourly, but jobs ending in _reaper run nightly. This is in the README. The README is 4,000 lines long.

This is the undocumented operational layer of any non-trivial codebase. It exists. It’s load-bearing. Nobody owns it. New engineers re-derive it from running the code and asking the wrong questions in Slack. Senior engineers either remember it or have a private Notes.app full of fragments from the last time they hit it.

Most engineering teams have accepted this as the cost of doing business. Architecture diagrams document the macro shape. CLAUDE.md / README / runbook docs catch the things someone thought to write down. Everything else lives in people’s heads and decays.

This was tolerable when the loop went: human reads ticket → human asks the codebase → human asks Slack → human ships. The lossiness was hidden inside the human’s attention span. The cost was conversation overhead, not lost work.

It is not tolerable anymore.

Why this matters more now

The AI engineering loop has compressed everything except the remembering.

The agent reads the ticket. The agent dispatches across files. The agent writes the tests. The agent opens the PR. The reviewer agent reads the diff cold and issues a verdict. The human’s job, in yesterday’s post, is to orchestrate, decide, and sequence. None of that took the remembering with it. If anything, the build phase shrinking made the remembering phase a larger fraction of the work.

The agent doesn’t remember x-api-auth across sessions. Each new agent loop starts with no idea that the codebase has this convention. The context window fills up with the current task, and the moment the session resets, the agent starts cold again. The first three rounds of any new session are spent re-discovering the things you already knew, because the agent has no continuity.

So you, the human, become the substrate that has to remember. You’re the one keeping the agent’s loop coherent across sessions. And — the thing nobody has been telling you — that responsibility scales faster in an AI engineering org than it did in a traditional one.

In a traditional org, you were responsible for remembering your own work. The team’s collective memory lived in code review history, Slack threads, and the documentation people felt motivated to write.

In an AI engineering org, you are responsible for remembering every quirk the agent will need to know to do your work correctly — and you are responsible for surfacing those quirks every time you brief an agent. The agent doesn’t have your Slack history. The agent doesn’t have your last-incident report. The agent has the brief you wrote it, plus whatever it can derive from the code in its context window.

If the brief doesn’t include “use x-api-auth, not x-api-key, see file src/integrations/legacy-payment.ts,” the agent will guess. Guesses cost time. Sometimes they cost more than time.

The release manager’s logbook

Yesterday’s post made the claim that the AI engineer is a release manager. The release management discipline has many artifacts; one of them is the logbook.

The logbook in a regulated industry is the documented running history of decisions: which changes shipped, in what wave, with what known risks, with what reasoning. It’s how the on-call at 3am knows whether last week’s rollback decision was “yes we reverted, here’s why, here’s what to do if it comes back” or “no we shipped the fix, here’s why.” It’s the artifact that survives across people and shifts.

A release manager without a logbook is just a person guessing at history. Engineers who came up writing implementation code largely have not been trained to keep one. The codebase felt like enough.

In the AI engineering loop, the codebase still isn’t enough — it has never been enough — but now the gap matters more. The undocumented operational layer (the x-api-auth header, the published_at convention, the deprecated-but-still-required _sweep suffix) is the logbook entry the codebase doesn’t surface. It needs to live somewhere durable, indexed, and readable by the next session of you (or your agent).

MEMORY.md is one shape of that logbook. The pattern: a flat directory of small markdown files, each capturing one durable fact, with MEMORY.md as the index. Claude Code reads the index at the start of every session. The agent finds the entry that matches the work in front of it, reads the body, and operates with the context. No prompting overhead. No “remember to use x-api-auth” in every brief.

For the rest of this post I’ll use MEMORY.md as shorthand, but the substrate could be Cursor’s equivalent, a docs/internals/ folder you maintain by hand, a Notion database, or a tagged set of Obsidian notes. The shape matters less than the discipline.

What to save

This is the part most teams get wrong. They either save nothing — because the discipline isn’t established — or they save a junk drawer of everything, which is worse because the index becomes unreadable and the entries become unreliable.

Three categories earn their keep.

Undocumented patterns of your existing system. The x-api-auth example. Any convention that exists in the codebase but is not surfaced where you’d look for it. Any helper that is named obviously but behaves non-obviously. Any field name that changed without the schema docs updating. Any guard that survives because of an incident.

For each entry, capture the rule, the why (the incident, the convention, the constraint), and the place in the codebase to verify it still holds.

Calibrations from working together. What worked, what didn’t, what got pushed back on. “Don’t run the image generation flow during business hours, costs API credit and the bookkeeping line-items get noisy.” “OG cards must be PNG, social scrapers reject SVG.” “When the reviewer rejects a deferral, default to the reviewer’s verdict — the producer’s bias is structural.”

These are not preferences in the abstract. They are the operational version of the team’s accumulated lessons, written down where the agent (and the next session of you) can act on them.

Operational IDs that don’t live anywhere else. The Buffer channel IDs you had to look up the first time. The org ID that an integration resolved automatically the second time. The Linear project where pipeline bugs are tracked. Anything that is “I had to look this up once, I’ll have to look it up every time, why is it not somewhere stable.”

This is the category that proves itself fastest. The next time you need the value, the next session of your agent doesn’t have to ask.

What NOT to save

The failure mode of memory is not under-collection. It is over-collection, plus staleness.

Things that look memory-worthy but aren’t:

Anything derivable from the codebase or git history. “The auth module lives in src/auth/.” Run ls src/. “The test runner is Vitest.” Read package.json. “Last PR was about the OG card pipeline.” Run git log. The codebase is authoritative for these. Memory is for what the codebase doesn’t document, not for re-stating what it already does.

Anything captured in CLAUDE.md. The repo-level operating rules belong in CLAUDE.md. The memory layer is for the things that don’t rise to “always true” but do rise to “I keep needing this.”

In-progress task state. The session-level scratchpad for what you’re currently doing is not memory. It’s working state. It belongs in the current conversation, the current plan, or a TODO list — not in the memory layer that future sessions read.

Anything that’s currently working but might change. “The login endpoint returns 200 for success.” Yes, today. Maybe not tomorrow. Memory entries that describe current behavior, rather than durable constraints, decay into liars within a quarter.

The right test for a memory entry: if I delete this entry and re-derive it next time, what’s the cost? If the cost is one grep, don’t save it. If the cost is asking a teammate, looking through a year of Slack, or re-running an experiment, save it.

The failure mode is staleness, not over-collection

The teams I see bounce off the memory pattern do so for one of two reasons. The first is they treat it as a junk drawer and the index becomes a 500-line wall of entries that nobody reads. The second is they write the entries once and never update them, and within six months half the entries describe a system that no longer exists.

The first is easier to avoid: keep the index tight (one line per entry, under 150 characters, no walls of prose at the index level). Write the entries themselves with detail, but the index is the throttle. The agent decides what to read based on the index; if the index is unreadable, the agent reads nothing useful.

The second is harder. Memory entries decay. The auth library you used to recommend gets replaced. The Buffer channel IDs rotate. The team convention you wrote down two quarters ago became official policy and now lives in CLAUDE.md instead. The discipline of memory is prune as much as you save. Every time you find an entry that’s no longer true, delete or update it before you do anything else.

This is the part the pattern’s critics get right. Memory without pruning is worse than no memory, because stale memory misleads the agent into shipping the wrong thing with high confidence.

The trick: every time you read an entry as part of normal work, verify it against current reality. If it still holds, fine. If it doesn’t, update or delete on the spot. This is one line of discipline. It’s also what makes the rest of the pattern work.

What to do Monday

Pick three things you re-derive every week. Things you’ve explained to a new hire or asked a teammate about more than twice. Things you have to look up because they live in code but not in any obvious place.

Write them down. Each one in its own file. Title that names what the entry is about, not what the entry says. Index line in MEMORY.md that gives the one-line hook in under 150 characters.

That’s the start. You will discover, within two weeks, that the discipline of writing them down has changed how you brief the agent — because now the brief can say “see memory entry on auth conventions” instead of re-typing the conventions in every brief. And the agent’s output gets noticeably better because it’s no longer guessing on the things you’d already know.

Within a quarter, the memory directory becomes the operational logbook of your team’s AI engineering work. Decisions that used to live in your head live here. Conventions that used to survive in Slack threads survive here. The undocumented half of your codebase finally has a home that isn’t your private notes file.

This is not glamorous work. It is the work the release management discipline was built around: write down what matters, prune what doesn’t, trust the artifact more than your memory. The same discipline that has kept regulated engineering orgs shipping reliably for twenty years now applies to a team of three with three agents each.

The codebase will never document the things in your head. CLAUDE.md will never be detailed enough. Slack will keep eating everything older than three months. The logbook is the layer between all of that and the agent loop. It has been the missing piece. Now you know where to put it.

If you’re trying to retool your team’s AI engineering practice and the memory layer keeps coming up as “we know we need this but we don’t have a shape for it yet,” book a call. I’d rather your team build the logbook discipline on purpose than after the second time the agent reshipped a broken integration because nobody told it about x-api-auth.