Networking in modern iOS: typed endpoints, retries/backoff, and observability without bloat

Networking code tends to start simple and then quietly become your app’s least testable, least observable subsystem.

You do not need a “networking layer framework” to fix that.

What you need is:

a typed way to describe requests and decode responses
explicit rules for retries and backoff
enough instrumentation to answer: “what failed, for whom, and how often?”

This post outlines a small set of patterns that stay readable in a product codebase.

1) Model the API as typed endpoints

A typed endpoint is a request description that can build a URLRequest and knows the expected response type.

Keep it boring:

no global mutable state
no magic stringly paths spread across features
no decoding hidden inside view models

A minimal endpoint definition:

import Foundation

enum HTTPMethod: String {
    case get = "GET"
    case post = "POST"
    case put = "PUT"
    case delete = "DELETE"
}

struct Endpoint<Response: Decodable> {
    var method: HTTPMethod
    var path: String
    var query: [URLQueryItem] = []
    var headers: [String: String] = [:]
    var body: Data? = nil

    func makeRequest(baseURL: URL) throws -> URLRequest {
        var components = URLComponents(url: baseURL.appendingPathComponent(path), resolvingAgainstBaseURL: false)
        components?.queryItems = query.isEmpty ? nil : query

        guard let url = components?.url else {
            throw URLError(.badURL)
        }

        var request = URLRequest(url: url)
        request.httpMethod = method.rawValue
        request.httpBody = body
        headers.forEach { request.setValue($1, forHTTPHeaderField: $0) }
        return request
    }
}

Usage stays straightforward:

struct UserDTO: Decodable {
    let id: String
    let email: String
}

extension Endpoint where Response == UserDTO {
    static func user(id: String) -> Self {
        Endpoint(method: .get, path: "/v1/users/\(id)")
    }
}

The point is not purity. The point is to centralize the request shape and response type.

2) One API client: decoding, errors, and cancellation

A good client does three things well:

executes a URLRequest
decodes success responses
produces an error that is useful to log and to show

Start with an error type you can reason about:

enum APIError: Error {
    case transport(URLError)
    case server(status: Int, body: Data?)
    case decoding(Error)
    case invalidResponse
}

Client implementation:

import Foundation

final class APIClient {
    private let baseURL: URL
    private let session: URLSession
    private let decoder: JSONDecoder

    init(baseURL: URL, session: URLSession = .shared, decoder: JSONDecoder = JSONDecoder()) {
        self.baseURL = baseURL
        self.session = session
        self.decoder = decoder
    }

    func send<Response: Decodable>(_ endpoint: Endpoint<Response>) async throws -> Response {
        let request = try endpoint.makeRequest(baseURL: baseURL)

        do {
            let (data, response) = try await session.data(for: request)
            guard let http = response as? HTTPURLResponse else {
                throw APIError.invalidResponse
            }

            guard (200...299).contains(http.statusCode) else {
                throw APIError.server(status: http.statusCode, body: data)
            }

            do {
                return try decoder.decode(Response.self, from: data)
            } catch {
                throw APIError.decoding(error)
            }
        } catch let urlError as URLError {
            throw APIError.transport(urlError)
        }
    }
}

This buys you:

consistent error mapping
correct cancellation behavior via async/await
a single place to add headers like auth and request IDs

3) Retries and backoff: decide what is safe

Retries are not an “on/off” feature.

The only correct retry policy is one that encodes which failures are transient and which requests are safe to repeat.

A practical policy:

retry on transport errors like .timedOut and .networkConnectionLost
retry on 502/503/504 with exponential backoff and jitter
do not retry requests with side effects unless they are idempotent

A simple retry wrapper:

struct RetryPolicy {
    var maxAttempts: Int = 3
    var baseDelaySeconds: Double = 0.4

    func shouldRetry(error: Error, attempt: Int, request: URLRequest) -> Bool {
        guard attempt < maxAttempts else { return false }

        // Only retry safe methods by default.
        let method = request.httpMethod?.uppercased()
        let isSafeMethod = (method == "GET" || method == "HEAD")

        if isSafeMethod == false {
            return false
        }

        if let api = error as? APIError {
            switch api {
            case .transport(let urlError):
                switch urlError.code {
                case .timedOut, .networkConnectionLost, .notConnectedToInternet, .cannotFindHost, .cannotConnectToHost:
                    return true
                default:
                    return false
                }

            case .server(let status, _):
                return status == 502 || status == 503 || status == 504

            default:
                return false
            }
        }

        return false
    }

    func delaySeconds(attempt: Int) -> Double {
        // Exponential backoff with jitter.
        let exp = baseDelaySeconds * pow(2.0, Double(attempt - 1))
        let jitter = Double.random(in: 0...0.2)
        return exp + jitter
    }
}

And an API client method that uses it:

extension APIClient {
    func send<Response: Decodable>(
        _ endpoint: Endpoint<Response>,
        retryPolicy: RetryPolicy
    ) async throws -> Response {
        let request = try endpoint.makeRequest(baseURL: baseURL)

        var attempt = 1
        while true {
            do {
                return try await send(endpoint)
            } catch {
                if retryPolicy.shouldRetry(error: error, attempt: attempt, request: request) {
                    let delay = retryPolicy.delaySeconds(attempt: attempt)
                    try await Task.sleep(nanoseconds: UInt64(delay * 1_000_000_000))
                    attempt += 1
                    continue
                }
                throw error
            }
        }
    }
}

Concrete failure mode: duplicate side effects caused by retries

A common production incident:

You introduce retries globally.
A POST /purchase times out on a slow cellular network.
The client retries.
The server processes both requests.

Users see duplicate receipts or duplicate credits.

Diagnosis path that works:

Add a client-generated request ID header (for example X-Request-ID) on every request.
Log it on the server alongside the operation identifier (order id, purchase id).
When an incident happens, search logs for the same user and two different X-Request-ID values that map to the same operation time window.

Fix:

for side-effecting endpoints, require an idempotency key (for example Idempotency-Key) and have the server deduplicate
if you cannot guarantee idempotency, do not auto-retry that endpoint

4) Observability without bloat: log what you need, not everything

Two goals:

developers can debug individual failures
the team can measure trends (error rate, latency, retry rate)

Add request correlation

At minimum:

X-Request-ID: unique per attempt
X-Session-ID or user identifier: only if your privacy model allows it

Keep request IDs in logs, crash reports, and bug reports.

Record per-request metrics

URLSessionTaskMetrics gives you timing and network details for each task. You can use it to answer:

are we DNS bound?
is TLS handshaking the bottleneck?
did we hit HTTP/2 connection reuse?

A lightweight verification step you can do this week:

In a debug build, attach a URLSessionTaskDelegate and log taskMetrics.transactionMetrics.first?.fetchStartDate and responseEndDate for a representative endpoint.
Run the same flow 10 times on Wi‑Fi and 10 times on cellular.
Compute p50 and p95 durations.
After introducing a retry policy or caching change, repeat and compare.

If you cannot reproduce the difference locally, you likely need server-side instrumentation too.

5) Keep feature code clean: inject the client, do not singleton it

A typed endpoint setup pays off when features stay dumb:

feature owns its endpoint definitions
app layer owns baseURL, auth, session configuration
tests can inject a stub session or a fake client

If you want a single rule to keep the layer from expanding forever:

endpoints describe requests
client executes and maps errors
features decide what to do with the result

That separation is what keeps networking maintainable six months later.