A practical setup for AI-assisted Apple-platform development

Most “AI for iOS” advice stops at the prompt. That is the easy part.

The hard part is the rest of the wiring. Which agents run where. Which tools they can touch. What the repo says before anyone asks. How sessions persist. How the build still passes when nobody has paid attention in three days.

This post is about that wiring. It is the setup I actually use on Apple-platform projects in 2026, with the parts that earn their place and the parts that got cut.

1. The setup is not the model

The model is a commodity. The setup is the product.

A new coding agent with a clean repo, a clear AGENTS.md, a small MCP layer, and a working test loop is more useful on day one than a better model in a junk drawer. The team that treats the model as the only lever spends the next year re-prompting their way out of problems the wiring should have prevented.

Decide what the setup has to do. For me, that is four things:

The agent knows the project without me explaining it every time.
The agent can read more than it can write, and the writes it gets are scoped.
The build, tests, and linter still run when the agent is the one running them.
I can stop the agent cold when something looks off, with no roll-back theater.

Everything else is decoration until it serves one of those.

2. The repo is the contract

If the agent does not know the project, the project has not told it.

A useful Apple-platform repo has four documents, all in the root:

README.md for humans: what the app is, how to build, how to run, where the major code lives.
AGENTS.md for agents: build commands, test commands, linter, formatting, conventions, the things not to touch, and the small set of patterns the project actually uses.
ARCHITECTURE.md (optional, but worth it on anything larger than a sample): modules, layering rules, where state lives, where side effects live.
CHANGELOG.md updated by humans, not the model.

The agent should be able to answer the boring questions from these files alone. Which scheme to open. Which target to build. How to run a single test. Where the analytics calls live. Where the feature flags are declared.

If the answer to “how do I run the tests” is in a Slack thread from last quarter, the answer is gone. Write it down. The agent reads it for free, and so do new teammates.

3. A minimal `AGENTS.md` that actually helps

Most AGENTS.md files I see fall into two camps: empty, or a constitution. Both are useless.

Empty is obvious. A constitution is the opposite failure. A page of aspirational rules the model will ignore as soon as a token budget gets tight.

The useful middle is a short, opinionated runbook. Something like:

# AGENTS.md

## Build and test
- Open `App.xcworkspace` (not the project file) with Xcode 16.
- Run unit tests: `xcodebuild test -workspace App.xcworkspace -scheme App -destination 'platform=iOS Simulator,name=iPhone 15'`.
- Run linter: `npm run lint`.
- Format Swift with `swift-format` (config in `.swift-format`).

## Conventions
- SwiftUI for new UI, UIKit only when a feature requires it.
- New code uses Swift 6 strict concurrency. Mark `@MainActor` at the boundary, not on every type.
- Errors carry `UserFacingError` and are mapped to UI through `ErrorPresenter`.
- Networking lives in `Networking/`. No new HTTP code in feature modules.
- Logging uses `Log.xcframework`, never `print`.

## Off-limits
- Do not edit `Pods/`, `DerivedData/`, `App/Resources/Generated/`.
- Do not touch signing, capabilities, or `Info.plist` unless the task is specifically about them.
- Do not change the deployment target.

## Patterns to follow
- View models use the `State` + `Action` enum shape from `AppCore/ViewModel.swift`.
- Async work uses the `Loadable<Value>` wrapper for view state.
- Tests follow the `Given-When-Then` structure used in `AppCoreTests/`.

That is the kind of file a coding agent can use in one pass. It points at the right build commands, names the patterns, and lists the off-limits zones.

Two things are deliberately missing: politeness, and a philosophy. The model does not need to be thanked. It also does not need a paragraph about “always prioritize user safety” that re-states the obvious. The runbook is for shipping code, not for moral theatre.

4. The MCP layer stays small

The MCP layer is the second most over-instrumented thing in a typical setup, after the prompt templates.

I run four MCP servers on a real Apple-platform project. That is the entire list:

Source code and tests (filesystem MCP, scoped to the repo).
Issues (Linear or GitHub, scoped to the current project).
Docs and design files (Notion or local markdown, read-only).
Logs and crashes for the most recent build (read-only, time-windowed).

No database. No production data. No internal admin tools. No customer support inbox. The agent can answer “what does this error mean in the last build” without being able to look at a user’s name.

The rule is simple. The agent gets read access to anything a senior engineer would consult. It gets write access to exactly the things the task demands, and not a byte more. Any MCP server that does not earn its place in the first two weeks gets removed. Stale MCP servers are worse than missing ones because the agent will still try them, then waste your time explaining why a tool nobody remembered is broken.

5. One model per job, not one model for everything

“Use AI for everything” is the same mistake as “use the Swiss Army knife for everything.” It works until the screw needs a Phillips head.

In practice, I run three different agents with different shapes:

A fast chat agent in the IDE for the small, local questions: rename this, explain this API, draft a unit test for this function. No tools, no repo access beyond the open files.
A coding agent in the terminal or a dedicated app for changes that touch more than one file: refactors, new features, multi-file bug fixes. Repo write access, scoped MCP, runs the test loop.
A reviewer agent that reads diffs and produces a structured review: lifecycle edges, concurrency ownership, test coverage, naming, platform constraints. No write access at all.

Trying to make one agent do all three is how you end up with a model that confidently renames a private function and then drafts a pull request about it.

6. Local tools that earn their disk space

A few small tools make the rest of the setup feel less like a science project.

A scratch repo for agent experiments. I keep one on every machine, with its own AGENTS.md that says “this is throwaway, do not save anything important here.” When the agent gets an idea, it goes there first. If the idea survives a build and a test run, it gets promoted into the real project.
A shared prompts/ folder. Reusable briefs, in plain text, versioned like code. New feature brief. Bug fix brief. Refactor brief. Test addition brief. If a prompt is good enough to use twice, it is good enough to live in the repo.
A small scripts/ folder with one shell script per common loop. scripts/test.sh, scripts/lint.sh, scripts/build.sh. The agent calls these instead of remembering the exact xcodebuild incantation, which means the incantation does not rot when the project upgrades Xcode.
A second machine or container for risky work. Builds, sign changes, anything that touches Keychain access groups, all happen somewhere I can throw away. Not because the agent is malicious. Because phones and computers misbehave often enough that isolation is cheaper than debugging in the moment.

The list is not large on purpose. Each tool that lives in the setup is one more thing to maintain, and one more thing that can go wrong while the agent is the one driving.

7. Sessions, memory, and not asking the same question twice

The single biggest productivity tax in an AI-assisted workflow is repetition. The agent forgets. The next session starts from zero. The third session asks the same question as the first.

The fix is project memory, kept somewhere the agent can read.

For me that is three layers:

AGENTS.md for things that never change: build commands, conventions, off-limits zones.
A short NOTES.md in the repo for things that change slowly: current focus, in-progress refactors, recent decisions.
Session logs that go into docs/agent-sessions/ and get pruned weekly. Each session gets a markdown file with the goal, the steps, the surprises, and the outcome.

The session log is the part most teams skip. It is also the part that compounds. Six months in, a new agent session can read the last three logs for the area it is about to touch, and it does not start the project from scratch. So does a new human teammate.

Do not let the agent write into any of these files unsupervised. The agent reads, the human writes. That keeps the memory honest.

8. The verification loop is the only loop that matters

A coding agent that writes code I cannot verify is a writing partner with a typing habit. Useful sometimes. Not a collaborator.

The verification loop is where the setup either pays for itself or does not. On Apple-platform work, the loop has to cover five things, in this order:

The build: the workspace scheme, the right destination, the configuration that matches what is shipping.
The linter and formatter: not optional, not advisory, blocking.
The targeted unit tests for the changed code, run by name, fast.
The full unit test suite for the affected module, slower.
A real device run for anything that touches the camera, the keychain, push, background tasks, or StoreKit.

The agent runs the first four on its own. The fifth is a human step. Trying to automate the device run usually ends with the agent declaring victory on a simulator pass that means nothing on hardware.

If the loop fails at any step, the agent does not move forward. It fixes the failure, or it asks. “It probably works” is not a passing condition.

9. The boring parts: linters, formatters, git hooks

Most “AI workflows” I see have no pre-commit hooks, no format check, and no lint step. The agent edits, the human approves, and the build breaks on the next CI run because somebody forgot a trailing comma in a generated file.

The fix is to be the boring team. The setup has:

A formatter that runs on save, in the editor, and as a pre-commit hook.
A linter that fails the build on errors, not just warnings.
A pre-commit hook that runs the format, the lint, and the targeted tests for the changed area.
A CI step that runs the full unit test suite, the format check, the lint, and a build on at least one real device or a clean simulator.

None of that is AI-specific. All of it matters more when an agent is the one writing, because the agent will produce ten tidy files that all violate the same house rule in the same way. The hooks catch the violation before the diff leaves the branch.

10. When the agent is the wrong tool

The final piece of the setup is the rule for when to put the agent down.

A coding agent is the wrong tool when:

The change requires understanding a part of the codebase that has not been written down yet, and reading the code is faster than explaining the code to a model.
The change touches security boundaries, signing, capabilities, or anything the platform gates hard. AI can help draft, a human must sign off.
The task is a one-line fix in a place the agent does not know, and the time spent briefing the agent is longer than the time spent fixing it.
The team is in incident mode. Speed matters more than the lesson, and the right person is at the keyboard anyway.

I keep these rules in AGENTS.md too, in a section called “When to ask a human first.” It is the most useful paragraph in the file.

11. Costs, rate limits, and the attention budget

Running agents is not free in 2026. It is also not the cost most teams track.

The real cost is attention. A coding agent that suggests a plausible refactor every fifteen minutes trains the team to skim diffs. Skim-reviewing is the gateway drug to merge-without-reading. Once the team starts doing it, the loop is broken, and no number of post-merge checks will save the codebase.

A few habits that help:

Long-running agent sessions run in a dedicated app or terminal pane, with a clear output channel the human can ignore while doing other work.
The agent batches its questions instead of asking every thirty seconds. “Ask once, then go” is a better default than “ping me after every step.”
The human checks the diff at three checkpoints: the plan, the first pass, and the verification output. Not after every line.
Rate limits are a feature, not a bug. They are a forcing function for the human to do the parts that need a human.

12. A reasonable daily rhythm

If I had to reduce the whole setup to a rhythm, it would be this:

Start the day in the repo. Read AGENTS.md, the last three session notes for the area you are touching, and the open issues. No agent yet.
Pick one job. Write the brief in prompts/. If the brief takes longer than ten minutes, the job is too vague and the agent will not save you.
Hand the brief to the coding agent. Let it run while you do something only a human can do: code review, design, talking to a teammate, or staring out of a window.
Review the diff the way you would review a junior engineer’s PR. Run the build, the linter, the targeted tests. Reject anything that does not earn its place.
Promote the session log. Prune the scratch repo. Update NOTES.md if the project direction changed.
Stop when the day is done. The agent will be there tomorrow, and so will you.

It is not a glamorous setup. There is no single tool that does all of it. The point is that every part of the wiring is cheap, replaceable, and earns its place by being the thing the team actually uses on a Tuesday afternoon.

That is the whole trick. Build the boring loop first. The model is the easy part.