Keeping a personal codebase clean when AI makes code cheap

AI changes the economics of personal software in one very specific way: typing code is no longer the expensive part.

That sounds liberating until the repo starts filling with helper files, generated abstractions, half-finished experiments, duplicate scripts, and tests that mostly prove the mocks are obedient little pets.

A personal codebase does not decay because one bad pull request lands. It decays because nobody is around to say no. There is no staff engineer blocking the architecture astronautics. No teammate asking why the new sync layer has six protocols. No reviewer gently suggesting that the agent has invented a framework because it was bored.

If you use AI heavily on personal projects, keeping the codebase clean becomes a separate discipline. Not a vibe. Not “I’ll tidy this later.” Later is where side projects go to become compost.

1. Treat generated code as surplus by default

The first mistake is assuming generated code deserves to stay because it was cheap to produce.

Cheap code is still expensive to own.

Every file adds search noise. Every abstraction adds a path to misunderstand. Every script becomes a tiny future dependency with a tiny future failure mode. The model does not pay that bill. You do.

My default posture with generated code is simple:

Accept the smallest part that solves the current problem.
Delete the scaffolding the moment it has served its purpose.
Refuse any abstraction that does not reduce a real maintenance cost today.

This feels wasteful if you still think in keystrokes. It feels sane if you think in future attention.

The agent can generate another helper tomorrow. It cannot give you back the hour you spend next month figuring out which helper is the real one.

2. Keep experiments outside the main app

AI makes experiments dangerously easy.

That is useful. It is also how a weekend project grows a CoreExperimentalServices folder, which is usually the software equivalent of a drawer full of unlabeled cables.

I keep experiments in a scratch area first:

A separate branch for anything that may not ship.
A scratch/ folder ignored by the app target, or a separate throwaway repo if the idea is messy.
A short note at the top of the experiment explaining what question it answers.

The question matters. “Try new architecture” is not a question. “Can this local search index handle 20,000 records without making launch worse?” is.

If the experiment answers the question, promote only the useful parts into the real code. If it does not, delete it. Do not preserve failed experiments as archaeological evidence. Future you is not a museum curator.

3. Make deletion a normal part of the loop

Most personal repos are bad at deletion because deletion feels like losing work.

It is not. It is refusing to store debt.

When an agent writes a change, I do a deletion pass before I do a polish pass. I look for:

duplicate types that already exist under a different name
“generic” helpers used once
protocols with one implementation and no test seam
comments explaining obvious code
fallback paths nobody requested
tests for behavior that is not a product rule
generated scripts that replace one shell command with a maintenance liability

This is not cleanup after the work. It is part of the work.

A good AI-assisted diff is often smaller after review than before. If the final diff is larger only because the model wanted the code to look designed, the model got confused. Happens to the best raccoons.

4. Write down the boring boundaries

Personal projects rarely have process, which is fine. They still need boundaries.

The fastest way to keep an AI-assisted repo clean is to write a short AGENTS.md or equivalent project note with the rules the agent must not rediscover every session.

Mine usually covers:

The build, lint, and test commands.
The folders the agent can edit freely.
The folders it should not touch unless the task names them.
Naming rules for services, stores, view models, and tests.
The preferred patterns for state, persistence, networking, and logging.
The patterns that are banned because they already hurt once.

The banned list is the important part.

“No new dependency injection framework.”

“No protocol unless there are two real implementations or a useful test seam.”

“No new global singleton.”

“Do not rewrite working code to match a pattern from the internet.”

This is not ceremony. It is a cheap immune system. Without it, every agent session starts with the confidence of a new consultant and the memory of a goldfish.

5. Keep one obvious path through the code

AI loves offering options. Personal codebases do not need options. They need one boring path that future you can follow while tired.

If the app has one persistence layer, use it everywhere. If it has one logging wrapper, use it everywhere. If feature state lives in observable stores, do not let one generated screen introduce a new reducer style because it looked elegant in isolation.

Consistency beats local cleverness in a personal repo because the maintainer is also the entire onboarding department.

This is where generated code needs pressure. The model will happily produce code that is correct in the small and alien in the repo. It will use a different naming convention, a different error style, a different async pattern, and a different test shape. None of those differences looks fatal alone. Together they turn the project into a thrift store.

Reject the alien shape even if it works.

The question is not “does this compile?” The question is “will I know where to fix this in six months?“

6. Do not let AI create architecture by accumulation

Architecture should appear because pressure forced it, not because five generated changes left behind five slightly different helpers.

The most common failure pattern looks like this:

The agent adds a small helper for a feature.
A later session does not notice the helper and adds another one.
A third session sees both and creates a generic abstraction to unify them.
Now the app has an architecture.

Congratulations. You have discovered sedimentary design.

Stop it earlier.

When two helpers overlap, delete one. When three call sites need the same behavior, extract it once. When a feature needs a boundary, name the boundary after the product concept, not after a design pattern.

Good personal-project architecture is usually boring:

feature code grouped by feature
shared code grouped only after it is genuinely shared
side effects behind narrow services
UI state owned close to the UI unless there is a reason to move it out
no new layer unless the current layer is visibly failing

This will not impress anyone on a conference slide. Excellent. Conference slides do not maintain your app.

7. Review the diff like you are inheriting it

The trick with personal projects is that you are always inheriting your own codebase.

You write something on Sunday. You come back three weeks later with no cache in your head and a suspicious feeling that past you was under-caffeinated. If AI wrote half of it, the suspicion is justified.

Review every generated diff with that future version of yourself in mind:

Can I explain why each new file exists?
Can I delete any part without changing behavior?
Does the code follow the existing shape of the repo?
Is the hard behavior covered by tests or only by optimism?
Did the agent change something outside the requested area?
Would I be annoyed to debug this on a bad Tuesday?

That last question is underrated. Personal software is maintained in scraps of attention. After work. Between meetings. Late at night when the build system decides to develop a personality.

Code that requires perfect context to understand is hostile to the person who will actually maintain it.

8. Keep dependencies on a short leash

AI tools are very comfortable adding dependencies. They have read every README. They have no shame.

A new package can be the right move, but the bar should be higher in a personal codebase because nobody else is paid to track the fallout.

Before accepting a dependency, ask:

Does the standard library or platform already do this well enough?
Is the problem important enough to outsource?
Is the package maintained?
Does it bring transitive dependencies I do not want?
Can I remove it cleanly later?

For small utilities, the answer is often no. A 20-line local function is not always worse than a dependency. Sometimes it is the adult choice, which is irritating because adulthood was not advertised in the brochure.

Agents tend to optimize for visible progress. Dependencies create visible progress quickly. Ownership shows up later.

Plan for later.

9. Use tests as a filter, not decoration

Generated tests can be useful. They can also be extremely decorative.

The bad version creates a mock, tells the mock what to return, calls the function, then celebrates when the function returns what the mock was told to return. This is not verification. It is a puppet show with assertions.

For personal projects, tests should protect the behavior you are likely to break when you forget the implementation details:

parsing messy real input
migration logic
sync conflict resolution
permission and cancellation paths
date, timezone, and locale behavior
feature flags and entitlement decisions
anything that once broke in production or on your own device

Do not require tests for every generated line. That turns the project into a paperwork exercise. Require tests where the cost of being wrong is higher than the cost of writing the test.

The test suite is not there to flatter the diff. It is there to make future changes less scary.

10. End each session with a smaller repo

The best habit I know is ending an AI-assisted session with a smaller, clearer repo than the one the agent tried to leave behind.

That does not mean fewer lines every time. It means fewer unexplained things.

Before committing, do a short closeout:

Remove scratch files and unused helpers.
Revert unrelated edits.
Run the formatter, linter, and tests.
Read the final diff without the agent’s summary.
Write the commit message in human language.

The agent’s summary is useful, but it is not evidence. The diff is evidence. The checks are evidence. The fact that the app still behaves correctly is evidence.

AI makes code cheap. It does not make maintenance cheap.

That is the whole discipline: accept speed where it helps, reject volume where it doesn’t, and keep the repo shaped like something one tired human can still understand.