Feature flags: safe rollouts without shipping fear
A practical feature flag setup for iOS: separate release, experiment, and kill-switch flags, keep evaluation deterministic, and avoid the cleanup debt that turns rollout safety into product entropy.
Feature flags are useful right up until they become your second configuration system.
That usually happens in the same predictable way:
- flags start as a safe rollout tool
- then product wants experiments
- then support needs kill switches
- then engineering adds local overrides
- then nobody can explain why a screen behaves differently for two users on the same build
The problem is not feature flags themselves. The problem is pretending all flags are the same.
If you want safer releases without turning the app into a haunted house, you need a small system with hard rules.
1. Start by splitting flags into categories
The first mistake is one giant FeatureFlags bag with thirty booleans in it.
Not all flags serve the same purpose. Treat them differently.
A practical split is:
- Release flags: gradually expose already-built functionality.
- Experiment flags: bucket users into variants and measure results.
- Operational flags: kill switches and degradation controls for incidents.
- Developer flags: local-only overrides for testing and QA.
Those categories should have different expectations.
For example:
- release flags should have an owner and an expiry date
- experiment flags should have variant definitions and analytics wiring
- operational flags should default to the safest behavior if config fetch fails
- developer flags should never be read from production backend config
If you mix these together, you get terrible decisions like using an A/B test flag as an emergency outage switch. That works exactly once.
2. Keep flag evaluation boring and deterministic
A flag system should not feel clever. It should be easy to answer:
- what value did this user get?
- why did they get it?
- when can this flag be deleted?
A simple model is enough:
import Foundation
enum FlagKey: String, CaseIterable, Sendable {
case paywallV2
case onboardingExperiment
case disableImageUploads
}
enum FlagValue: Sendable, Equatable {
case bool(Bool)
case string(String)
case int(Int)
}
struct FlagSnapshot: Sendable {
let values: [FlagKey: FlagValue]
let fetchedAt: Date
let source: Source
enum Source: Sendable {
case bundledDefaults
case remoteConfig
case localOverride
}
}
And an evaluator that is explicit about precedence:
struct FlagEvaluator: Sendable {
let localOverrides: [FlagKey: FlagValue]
let remoteValues: [FlagKey: FlagValue]
let defaults: [FlagKey: FlagValue]
func value(for key: FlagKey) -> FlagValue {
if let local = localOverrides[key] { return local }
if let remote = remoteValues[key] { return remote }
return defaults[key] ?? .bool(false)
}
func isEnabled(_ key: FlagKey) -> Bool {
guard case .bool(let enabled) = value(for: key) else { return false }
return enabled
}
}
The interesting part is not the code. The interesting part is the contract:
- local override wins for internal testing
- remote config wins over bundled defaults
- defaults are safe and shippable
- evaluation does not depend on mutable view state
If the answer can change halfway through a screen because three objects re-read config differently, you are not doing flags. You are doing runtime chaos.
3. Use typed access, not string soup
Stringly flags feel fast until someone renames new_checkout_flow to checkout_v2, misses one usage, and quietly ships dead logic.
Typed keys are not glamorous, but they prevent a lot of accidental mess.
Bad:
if remoteConfig.bool(forKey: "new_checkout_flow") {
// ...
}
Better:
if flags.isEnabled(.paywallV2) {
// ...
}
That buys you a few important things:
- one place to inventory every active flag
- compile-time help during cleanup
- easier code search when the rollout is done
- a better chance of noticing duplicated flags with slightly different names
Feature flags already increase branching. Do not also make the names sloppy.
4. Separate fetch from evaluation
Another common mistake: UI code asks the remote config SDK directly whether something is enabled.
That leaks infrastructure decisions into product code and makes testing annoying.
Instead, split the system into three parts:
- Provider fetches raw config.
- Store persists the latest snapshot.
- Evaluator answers product-level questions.
Something like this is plenty:
protocol FlagProvider: Sendable {
func fetch() async throws -> [FlagKey: FlagValue]
}
protocol FlagStore: Sendable {
func load() async -> FlagSnapshot?
func save(_ snapshot: FlagSnapshot) async
}
actor FlagsService {
private let provider: FlagProvider
private let store: FlagStore
private let defaults: [FlagKey: FlagValue]
private var snapshot: FlagSnapshot
init(
provider: FlagProvider,
store: FlagStore,
defaults: [FlagKey: FlagValue]
) {
self.provider = provider
self.store = store
self.defaults = defaults
self.snapshot = FlagSnapshot(values: defaults, fetchedAt: .distantPast, source: .bundledDefaults)
}
func refresh() async {
do {
let values = try await provider.fetch()
let snapshot = FlagSnapshot(values: values, fetchedAt: .now, source: .remoteConfig)
self.snapshot = snapshot
await store.save(snapshot)
} catch {
// Keep the last known good snapshot. Reliability beats drama.
}
}
func evaluator(localOverrides: [FlagKey: FlagValue] = [:]) -> FlagEvaluator {
FlagEvaluator(
localOverrides: localOverrides,
remoteValues: snapshot.values,
defaults: defaults
)
}
}
Now the rest of the app depends on FlagEvaluator, not on your config vendor.
That matters more than it sounds. Vendors change. Your app architecture should not.
5. Design for stale config, because it will be stale
A lot of teams talk about “real-time feature flags” as if every app session has perfect network and instant config delivery.
On mobile, assume this instead:
- the app may launch offline
- config may fail to fetch
- the user may keep the app installed for months without updating
- the last good snapshot might be hours or days old
So every important flag needs a stale-config policy.
Release flags
Usually safe to use the last known value, provided the default also produces a coherent product.
Experiment flags
Need stable bucketing. A user should not bounce between A and B because config refresh timing changed.
Operational flags
Need explicit safe behavior.
For example, if uploads are causing crashes in one subsystem, this is a good operational flag:
- remote value
true→ disable uploads - default value
false→ feature works when there is no incident - incident runbook explains when to flip it and what user impact to expect
What you do not want is an operational flag whose safe value is unclear because half the app assumes the opposite default.
6. Keep bucketing stable or your experiments are fiction
If you run experiments, assign variants from a stable identifier and persist the result.
Do not:
- bucket from something ephemeral like session start
- rebucket every launch
- let local time or fetch timing affect the variant
A minimal deterministic bucketing approach:
import Foundation
import CryptoKit
func bucket(userID: String, experiment: String, modulo: Int = 100) -> Int {
let input = Data("\(experiment):\(userID)".utf8)
let digest = SHA256.hash(data: input)
let value = digest.prefix(8).reduce(0 as UInt64) { partial, byte in
(partial << 8) | UInt64(byte)
}
return Int(value % UInt64(modulo))
}
func variant(for userID: String) -> String {
bucket(userID: userID, experiment: "onboardingExperiment") < 50 ? "control" : "treatment"
}
It does not need to be complicated. It needs to be stable.
If your experiment assignment is unstable, your analytics are theater with charts.
7. Put kill switches on the edges, not in every view
Operational flags should degrade systems, not infect every screen with incident logic.
For example, imagine image uploads are causing memory spikes. Do this:
struct UploadPolicy: Sendable {
let uploadsEnabled: Bool
let maxConcurrentUploads: Int
}
final class UploadCoordinator {
private let policy: UploadPolicy
init(policy: UploadPolicy) {
self.policy = policy
}
func enqueue(_ item: PendingUpload) throws {
guard policy.uploadsEnabled else {
throw UploadError.temporarilyDisabled
}
// normal queueing flow
}
}
Do not do this in fifteen places:
- hide one button here
- show one warning there
- special-case retries elsewhere
- bypass the queue in another code path because “that screen is different”
If a flag changes system behavior, put it near the system boundary.
That gives you one place to test and one place to reason about during an incident.
8. Add observability or you are guessing
You need to be able to answer these questions without archaeology:
- Which flags were active for this user?
- Which snapshot version did the app evaluate?
- Was the value remote, default, or local override?
- When was the snapshot fetched?
You do not need to log every flag on every tap. You do need enough context in analytics and bug reports to reconstruct behavior.
A practical setup:
- include key rollout flags in screen-view and conversion events
- attach snapshot metadata to debug logs
- expose active flags in an internal debug screen
- include flag state in support-exported diagnostics if privacy allows it
The debug screen is boring, and that is precisely why it is valuable.
When QA says “I can’t reproduce it,” a flag panel usually explains why faster than another meeting will.
9. Write deletion into the process
Feature flags are a temporary safety tool.
Teams get into trouble when they treat them as permanent architecture.
Every new flag should have:
- an owner
- a reason it exists
- a creation date
- a delete condition
- a delete-by date if possible
I like a tiny metadata structure for this:
struct FlagMetadata: Sendable {
let owner: String
let category: Category
let createdAt: Date
let deleteWhen: String
enum Category: String, Sendable {
case release
case experiment
case operational
case developer
}
}
This can live alongside the typed flag definitions or in a simple internal document. The exact format matters less than forcing the question.
If nobody knows when a flag should die, it will survive long enough to distort the codebase.
The cleanup rule worth enforcing
When a rollout reaches 100% and the feature is healthy, the next task is not “celebrate.”
The next task is:
- delete the old path
- delete the flag
- delete the analytics branch that only existed for rollout comparison
- delete the local override if it is no longer useful
Otherwise “safe rollout” turns into “permanent branch complexity with a nice origin story.”
10. Keep local overrides behind a deliberate debug surface
Developers and QA need local overrides. Production users do not.
A decent setup usually means:
- debug-only flag menu
- optional persistence for local testing
- clear reset button
- obvious label showing an override is active
The important bit is social, not technical: local overrides should be visible.
Nothing wastes time like debugging a “production issue” that only exists because someone left paywallV2 = true in a hidden settings screen three days ago.
11. The architecture that usually holds up
If I had to keep it compact, I would use this shape:
- Bundled defaults shipped with the app
- Remote snapshot fetched on launch and occasionally refreshed
- Typed evaluator used by product code
- Debug overrides layered only in internal builds
- Metadata + cleanup policy to keep the system small
That is enough for most iOS teams.
You do not need a feature-management platform before you have feature-management discipline.
12. A blunt checklist before adding a new flag
Before introducing a flag, ask:
- Is this a release flag, experiment, operational switch, or developer override?
- What is the safe default if config never arrives?
- Who owns it?
- How will I debug its current value on a real device?
- When does it get deleted?
- Can the branch live at a system boundary instead of leaking through the UI?
If you cannot answer those quickly, the flag is not ready.
Final take
Feature flags are worth having because they reduce release risk.
They stop being worth having when they quietly become unmanaged branching infrastructure.
Keep the categories explicit. Keep evaluation deterministic. Put kill switches at boundaries. Delete flags aggressively.
That is how you get the upside: safer rollouts, calmer incidents, and fewer Friday-night “why is this user seeing that screen?” conversations.