Image loading on iOS: caching, decoding, and the mistakes that make scrolling worse

Image loading bugs love bad diagnosis.

A feed stutters, memory climbs, cells flicker, and somebody says the list is slow.

Usually the list is fine.

The problem is further upstream:

requests are being restarted too often
large images are decoded on the main thread
the cache stores the wrong representation
views are recreated in ways that defeat reuse
cancellation is missing, so off-screen work keeps running

If you fix those boundaries, scrolling usually gets boring again, which is exactly what you want.

1. Start with the pipeline, not the view

An image-loading system has four separate jobs:

request the bytes
cache something useful
decode and optionally downsample the image
deliver the result to the UI with cancellation

Teams often blur those together inside a SwiftUI view or a UIKit cell subclass. That works for a prototype and then quietly turns into a performance tax.

A better split is:

loader for transport and request deduplication
memory cache for fast reuse
disk/HTTP cache for network efficiency
decoder for downsampling and decompression
UI adapter for lifecycle and cancellation

That separation is not architecture cosplay. It is what lets you answer basic production questions like:

are we fetching too much?
are we decoding too much?
are we holding too much in memory?
are off-screen images still doing work?

If you cannot answer those, you do not have an image pipeline. You have vibes.

2. Cache the right thing, not just anything

The first mistake is talking about “the cache” as if it were one thing.

It is usually two different layers with different jobs.

HTTP or disk cache: avoid downloading again

URLCache is for response reuse.

It helps when:

the server sends sane cache headers
the same URL is requested repeatedly
you want to avoid another network hop

It does not solve decode cost, resize cost, or first-render smoothness by itself.

Memory cache: avoid re-decoding and reprocessing

For scrolling performance, the more valuable cache is often the in-memory image cache holding a display-ready result.

That usually means one of these:

a downsampled UIImage sized for the target surface
a processed variant for a known thumbnail size
occasionally the raw original if you truly reuse it at full size

Caching the original 3000 px image for a 72 pt avatar is wasteful. You save network time and then lose it again decoding a giant bitmap for no reason.

A more honest rule is:

cache original responses on disk
cache display-sized images in memory

Those are different assets, even when they came from the same URL.

3. Downsampling beats brute-force decoding

A lot of iOS image pain comes from this simple mismatch:

the server returns a large image
the app shows a much smaller image
the app still decodes the full-resolution bitmap

That burns CPU, memory bandwidth, and RAM.

If the image is only ever shown in a small container, downsample before creating the final image.

import Foundation
import ImageIO
import UIKit

enum ImageDecoder {
    static func downsampledImage(
        from data: Data,
        maxPixelSize: Int,
        scale: CGFloat = UIScreen.main.scale
    ) -> UIImage? {
        let options: [CFString: Any] = [
            kCGImageSourceShouldCache: false
        ]

        guard let source = CGImageSourceCreateWithData(data as CFData, options as CFDictionary) else {
            return nil
        }

        let downsampleOptions: [CFString: Any] = [
            kCGImageSourceCreateThumbnailFromImageAlways: true,
            kCGImageSourceCreateThumbnailWithTransform: true,
            kCGImageSourceShouldCacheImmediately: true,
            kCGImageSourceThumbnailMaxPixelSize: Int(CGFloat(maxPixelSize) * scale)
        ]

        guard let image = CGImageSourceCreateThumbnailAtIndex(
            source,
            0,
            downsampleOptions as CFDictionary
        ) else {
            return nil
        }

        return UIImage(cgImage: image)
    }
}

That one change often matters more than adding another cache layer.

Use the rendered size as the input, not the original image dimensions.

Examples:

44 pt avatar at 3x scale, roughly 132 px max
120 pt card thumbnail at 3x scale, roughly 360 px max
full-width detail hero, maybe much larger, but still bounded

If every surface uses the same original asset at different sizes, model those as separate cached variants. Pretending one bitmap fits all is how memory graphs get ugly.

4. Decode off the main thread, or the scroll hitch is your fault

Image decode and decompression are not free.

When you create a UIImage, part of the real cost may be deferred until draw time. If that first draw happens on the main thread during fast scrolling, congratulations, you just scheduled jank directly into the user experience.

A safer approach is:

fetch bytes asynchronously
decode/downsample off the main actor
publish a ready-to-draw image back to the UI

A small loader can do that cleanly.

import UIKit

actor ImagePipeline {
    private let session: URLSession
    private let cache = NSCache<NSURL, UIImage>()
    private var tasks: [URL: Task<UIImage, Error>] = [:]

    init(session: URLSession = .shared) {
        self.session = session
        cache.countLimit = 200
    }

    func image(for url: URL, maxPixelSize: Int) async throws -> UIImage {
        if let cached = cache.object(forKey: url as NSURL) {
            return cached
        }

        if let existing = tasks[url] {
            return try await existing.value
        }

        let task = Task<UIImage, Error> {
            defer { Task { await self.clearTask(for: url) } }

            let (data, _) = try await session.data(from: url)

            guard let image = ImageDecoder.downsampledImage(
                from: data,
                maxPixelSize: maxPixelSize
            ) else {
                throw URLError(.cannotDecodeContentData)
            }

            cache.setObject(image, forKey: url as NSURL)
            return image
        }

        tasks[url] = task
        return try await task.value
    }

    private func clearTask(for url: URL) {
        tasks[url] = nil
    }
}

This example keeps the important behavior in one place:

memory reuse
request deduplication
background decode work

It is not complete production code, but it is the right shape.

5. Request deduplication matters more than people think

One fast way to waste resources is letting five visible rows ask for the same image and start five separate tasks.

That happens more often than teams admit:

the same avatar appears in multiple places
a cell is recreated during state churn
a prefetch path and a visible render path both fetch
retry logic starts a new request before the old one is observed

Deduplication is cheap leverage.

If a request for a URL is already in flight, later callers should usually await the same task.

That buys you:

less radio and CPU usage
fewer duplicate decodes
more stable scroll performance under churn

The tricky bit is the cache key.

If the same URL is rendered into multiple sizes, keying only by URL may be wrong for the memory cache. In that case, the key should include the variant too, for example:

URL + target pixel size
URL + processing mode
URL + scale class

Disk cache keys and memory cache keys do not have to be identical. Trying to force one universal cache identity is usually a bad compromise.

6. Cancellation is part of correctness, not polish

If an image request outlives the view that asked for it, the work is often pointless.

In a fast-scrolling feed, off-screen work should be canceled aggressively.

SwiftUI makes it easy to forget this because .task(id:) feels convenient.

It is convenient. It is also easy to misuse.

A reasonable pattern is:

import SwiftUI

struct RemoteThumbnail: View {
    let url: URL
    let pipeline: ImagePipeline

    @State private var image: UIImage?

    var body: some View {
        Group {
            if let image {
                Image(uiImage: image)
                    .resizable()
                    .scaledToFill()
            } else {
                Color.secondary.opacity(0.12)
            }
        }
        .task(id: url) {
            do {
                image = try await pipeline.image(for: url, maxPixelSize: 240)
            } catch is CancellationError {
                // expected during fast scroll churn
            } catch {
                image = nil
            }
        }
    }
}

That is fine as long as the surrounding view identity is stable.

If the row identity is unstable, the task will restart constantly and your pipeline will look guilty for a bug that started in the list diffing layer.

This is why image loading and list performance are usually entangled. The image system cannot save a view tree that keeps pretending every row is new.

7. Prefetching helps only when the rest of the pipeline is sane

Teams love saying “let’s add prefetching” as if it is automatically advanced.

It is not advanced. It is just easy to do badly.

Prefetching helps when:

the next items are predictable
requests are cancelable
decoded variants are reused soon after
memory pressure stays under control

Prefetching hurts when:

the app fetches far more than the user will see
decoded images crowd out visible ones
the prefetch path duplicates visible work
you cannot distinguish speculative work from demanded work

A blunt rule I like:

make on-demand loading correct
add request deduplication
measure visible hitching and miss rate
only then add small-window prefetching

If step 1 is shaky, step 4 just makes the bug happen earlier.

8. Avoid “generic image loader” abstractions that hide the expensive parts

A lot of codebases end up with a beautiful API and a mediocre pipeline.

Something like this:

protocol ImageLoading {
    func loadImage(from url: URL) async throws -> UIImage
}

Nice signature. Missing half the important decisions.

Where do size variants live? How is cancellation handled? What is cached, original or processed? How are failures and cache hits observed? Can the caller opt into lower priority prefetch work?

A generic abstraction is fine if it still models the real constraints.

A better shape is often closer to this:

struct ImageRequest: Hashable, Sendable {
    let url: URL
    let maxPixelSize: Int
}

Then the pipeline accepts ImageRequest, not just URL.

That one change forces the code to acknowledge that image loading is not only about transport. It is about the final render contract.

9. Instrument the pipeline or stop guessing

If image loading is important to the app, add basic observability.

You do not need a monitoring startup costume.

You do need a few numbers:

memory-cache hit rate
in-flight request count
average decode time
cancellation count
average image byte size
number of oversized assets hitting small surfaces

Even lightweight signposts help.

import os

private let log = OSLog(subsystem: "dev.vburojevic.website", category: "image-pipeline")

func measureDecode<T>(_ block: () throws -> T) rethrows -> T {
    let signpostID = OSSignpostID(log: log)
    os_signpost(.begin, log: log, name: "Decode", signpostID: signpostID)
    defer { os_signpost(.end, log: log, name: "Decode", signpostID: signpostID) }
    return try block()
}

Now Instruments can show you whether the problem is:

network latency
repeated decode work
oversized source images
main-thread hitching during display

Without that, teams tend to cargo-cult fixes from old Slack threads.

10. The common failure patterns are boring, which is good news

The good news is that most image-loading problems are not exotic.

They are usually one of these:

1. Full-resolution decoding for tiny surfaces

Symptom:

memory spikes
scroll hitches when images first appear

Fix:

downsample to display size
cache the processed variant

2. No request deduplication

Symptom:

duplicate network traffic
repeated decode work
stutter during rapid list updates

Fix:

keep an in-flight task registry keyed by request variant

3. Work continues after the view disappears

Symptom:

wasted network and CPU
memory churn during fast scrolls

Fix:

honor cancellation from view lifecycle
separate speculative prefetch tasks from demanded work

4. Stable cache on disk, unstable cache in memory

Symptom:

good network behavior but still janky rendering

Fix:

store display-ready images in memory
stop assuming URLCache is enough

5. The list diffing layer restarts everything

Symptom:

image flicker
repeated task restarts
blame unfairly assigned to the image pipeline

Fix:

fix row identity and state ownership first

11. A production rule worth keeping

If you only keep one rule from this post, keep this one:

Optimize for rendered pixels, not downloaded bytes.

Downloaded bytes matter for bandwidth. Rendered pixels are what decide decode cost, memory pressure, and scroll smoothness.

That framing leads to better decisions:

variant-aware caching
downsampling before display
fewer giant images living in RAM for tiny views
less pointless work when cells churn

It also stops the classic mistake of “we added caching, why is scrolling still bad?”

Because caching the wrong representation still gives you the wrong bottleneck.

12. The practical baseline I would ship

For most product apps, a sane baseline is:

URLSession with normal HTTP caching behavior
a memory cache keyed by ImageRequest
background downsampling to target pixel size
in-flight request deduplication
aggressive cancellation for off-screen work
lightweight metrics for hit rate, decode time, and cancellations

That setup is not fancy.

It is enough.

And that is the point. Image loading should feel invisible when it is healthy.

If users are noticing it, the bug is rarely that you need a more glamorous framework.

You probably just need to stop doing expensive work at the worst possible moment.