Image loading on iOS: caching, decoding, and the mistakes that make scrolling worse
A practical image-loading setup for iOS: cache the right thing, decode off the main thread, control request churn, and stop blaming scrolling jank on the collection view.
Image loading bugs love bad diagnosis.
A feed stutters, memory climbs, cells flicker, and somebody says the list is slow.
Usually the list is fine.
The problem is further upstream:
- requests are being restarted too often
- large images are decoded on the main thread
- the cache stores the wrong representation
- views are recreated in ways that defeat reuse
- cancellation is missing, so off-screen work keeps running
If you fix those boundaries, scrolling usually gets boring again, which is exactly what you want.
1. Start with the pipeline, not the view
An image-loading system has four separate jobs:
- request the bytes
- cache something useful
- decode and optionally downsample the image
- deliver the result to the UI with cancellation
Teams often blur those together inside a SwiftUI view or a UIKit cell subclass. That works for a prototype and then quietly turns into a performance tax.
A better split is:
- loader for transport and request deduplication
- memory cache for fast reuse
- disk/HTTP cache for network efficiency
- decoder for downsampling and decompression
- UI adapter for lifecycle and cancellation
That separation is not architecture cosplay. It is what lets you answer basic production questions like:
- are we fetching too much?
- are we decoding too much?
- are we holding too much in memory?
- are off-screen images still doing work?
If you cannot answer those, you do not have an image pipeline. You have vibes.
2. Cache the right thing, not just anything
The first mistake is talking about “the cache” as if it were one thing.
It is usually two different layers with different jobs.
HTTP or disk cache: avoid downloading again
URLCache is for response reuse.
It helps when:
- the server sends sane cache headers
- the same URL is requested repeatedly
- you want to avoid another network hop
It does not solve decode cost, resize cost, or first-render smoothness by itself.
Memory cache: avoid re-decoding and reprocessing
For scrolling performance, the more valuable cache is often the in-memory image cache holding a display-ready result.
That usually means one of these:
- a downsampled
UIImagesized for the target surface - a processed variant for a known thumbnail size
- occasionally the raw original if you truly reuse it at full size
Caching the original 3000 px image for a 72 pt avatar is wasteful. You save network time and then lose it again decoding a giant bitmap for no reason.
A more honest rule is:
- cache original responses on disk
- cache display-sized images in memory
Those are different assets, even when they came from the same URL.
3. Downsampling beats brute-force decoding
A lot of iOS image pain comes from this simple mismatch:
- the server returns a large image
- the app shows a much smaller image
- the app still decodes the full-resolution bitmap
That burns CPU, memory bandwidth, and RAM.
If the image is only ever shown in a small container, downsample before creating the final image.
import Foundation
import ImageIO
import UIKit
enum ImageDecoder {
static func downsampledImage(
from data: Data,
maxPixelSize: Int,
scale: CGFloat = UIScreen.main.scale
) -> UIImage? {
let options: [CFString: Any] = [
kCGImageSourceShouldCache: false
]
guard let source = CGImageSourceCreateWithData(data as CFData, options as CFDictionary) else {
return nil
}
let downsampleOptions: [CFString: Any] = [
kCGImageSourceCreateThumbnailFromImageAlways: true,
kCGImageSourceCreateThumbnailWithTransform: true,
kCGImageSourceShouldCacheImmediately: true,
kCGImageSourceThumbnailMaxPixelSize: Int(CGFloat(maxPixelSize) * scale)
]
guard let image = CGImageSourceCreateThumbnailAtIndex(
source,
0,
downsampleOptions as CFDictionary
) else {
return nil
}
return UIImage(cgImage: image)
}
}
That one change often matters more than adding another cache layer.
Use the rendered size as the input, not the original image dimensions.
Examples:
- 44 pt avatar at 3x scale, roughly 132 px max
- 120 pt card thumbnail at 3x scale, roughly 360 px max
- full-width detail hero, maybe much larger, but still bounded
If every surface uses the same original asset at different sizes, model those as separate cached variants. Pretending one bitmap fits all is how memory graphs get ugly.
4. Decode off the main thread, or the scroll hitch is your fault
Image decode and decompression are not free.
When you create a UIImage, part of the real cost may be deferred until draw time. If that first draw happens on the main thread during fast scrolling, congratulations, you just scheduled jank directly into the user experience.
A safer approach is:
- fetch bytes asynchronously
- decode/downsample off the main actor
- publish a ready-to-draw image back to the UI
A small loader can do that cleanly.
import UIKit
actor ImagePipeline {
private let session: URLSession
private let cache = NSCache<NSURL, UIImage>()
private var tasks: [URL: Task<UIImage, Error>] = [:]
init(session: URLSession = .shared) {
self.session = session
cache.countLimit = 200
}
func image(for url: URL, maxPixelSize: Int) async throws -> UIImage {
if let cached = cache.object(forKey: url as NSURL) {
return cached
}
if let existing = tasks[url] {
return try await existing.value
}
let task = Task<UIImage, Error> {
defer { Task { await self.clearTask(for: url) } }
let (data, _) = try await session.data(from: url)
guard let image = ImageDecoder.downsampledImage(
from: data,
maxPixelSize: maxPixelSize
) else {
throw URLError(.cannotDecodeContentData)
}
cache.setObject(image, forKey: url as NSURL)
return image
}
tasks[url] = task
return try await task.value
}
private func clearTask(for url: URL) {
tasks[url] = nil
}
}
This example keeps the important behavior in one place:
- memory reuse
- request deduplication
- background decode work
It is not complete production code, but it is the right shape.
5. Request deduplication matters more than people think
One fast way to waste resources is letting five visible rows ask for the same image and start five separate tasks.
That happens more often than teams admit:
- the same avatar appears in multiple places
- a cell is recreated during state churn
- a prefetch path and a visible render path both fetch
- retry logic starts a new request before the old one is observed
Deduplication is cheap leverage.
If a request for a URL is already in flight, later callers should usually await the same task.
That buys you:
- less radio and CPU usage
- fewer duplicate decodes
- more stable scroll performance under churn
The tricky bit is the cache key.
If the same URL is rendered into multiple sizes, keying only by URL may be wrong for the memory cache. In that case, the key should include the variant too, for example:
- URL + target pixel size
- URL + processing mode
- URL + scale class
Disk cache keys and memory cache keys do not have to be identical. Trying to force one universal cache identity is usually a bad compromise.
6. Cancellation is part of correctness, not polish
If an image request outlives the view that asked for it, the work is often pointless.
In a fast-scrolling feed, off-screen work should be canceled aggressively.
SwiftUI makes it easy to forget this because .task(id:) feels convenient.
It is convenient. It is also easy to misuse.
A reasonable pattern is:
import SwiftUI
struct RemoteThumbnail: View {
let url: URL
let pipeline: ImagePipeline
@State private var image: UIImage?
var body: some View {
Group {
if let image {
Image(uiImage: image)
.resizable()
.scaledToFill()
} else {
Color.secondary.opacity(0.12)
}
}
.task(id: url) {
do {
image = try await pipeline.image(for: url, maxPixelSize: 240)
} catch is CancellationError {
// expected during fast scroll churn
} catch {
image = nil
}
}
}
}
That is fine as long as the surrounding view identity is stable.
If the row identity is unstable, the task will restart constantly and your pipeline will look guilty for a bug that started in the list diffing layer.
This is why image loading and list performance are usually entangled. The image system cannot save a view tree that keeps pretending every row is new.
7. Prefetching helps only when the rest of the pipeline is sane
Teams love saying “let’s add prefetching” as if it is automatically advanced.
It is not advanced. It is just easy to do badly.
Prefetching helps when:
- the next items are predictable
- requests are cancelable
- decoded variants are reused soon after
- memory pressure stays under control
Prefetching hurts when:
- the app fetches far more than the user will see
- decoded images crowd out visible ones
- the prefetch path duplicates visible work
- you cannot distinguish speculative work from demanded work
A blunt rule I like:
- make on-demand loading correct
- add request deduplication
- measure visible hitching and miss rate
- only then add small-window prefetching
If step 1 is shaky, step 4 just makes the bug happen earlier.
8. Avoid “generic image loader” abstractions that hide the expensive parts
A lot of codebases end up with a beautiful API and a mediocre pipeline.
Something like this:
protocol ImageLoading {
func loadImage(from url: URL) async throws -> UIImage
}
Nice signature. Missing half the important decisions.
Where do size variants live? How is cancellation handled? What is cached, original or processed? How are failures and cache hits observed? Can the caller opt into lower priority prefetch work?
A generic abstraction is fine if it still models the real constraints.
A better shape is often closer to this:
struct ImageRequest: Hashable, Sendable {
let url: URL
let maxPixelSize: Int
}
Then the pipeline accepts ImageRequest, not just URL.
That one change forces the code to acknowledge that image loading is not only about transport. It is about the final render contract.
9. Instrument the pipeline or stop guessing
If image loading is important to the app, add basic observability.
You do not need a monitoring startup costume.
You do need a few numbers:
- memory-cache hit rate
- in-flight request count
- average decode time
- cancellation count
- average image byte size
- number of oversized assets hitting small surfaces
Even lightweight signposts help.
import os
private let log = OSLog(subsystem: "dev.vburojevic.website", category: "image-pipeline")
func measureDecode<T>(_ block: () throws -> T) rethrows -> T {
let signpostID = OSSignpostID(log: log)
os_signpost(.begin, log: log, name: "Decode", signpostID: signpostID)
defer { os_signpost(.end, log: log, name: "Decode", signpostID: signpostID) }
return try block()
}
Now Instruments can show you whether the problem is:
- network latency
- repeated decode work
- oversized source images
- main-thread hitching during display
Without that, teams tend to cargo-cult fixes from old Slack threads.
10. The common failure patterns are boring, which is good news
The good news is that most image-loading problems are not exotic.
They are usually one of these:
1. Full-resolution decoding for tiny surfaces
Symptom:
- memory spikes
- scroll hitches when images first appear
Fix:
- downsample to display size
- cache the processed variant
2. No request deduplication
Symptom:
- duplicate network traffic
- repeated decode work
- stutter during rapid list updates
Fix:
- keep an in-flight task registry keyed by request variant
3. Work continues after the view disappears
Symptom:
- wasted network and CPU
- memory churn during fast scrolls
Fix:
- honor cancellation from view lifecycle
- separate speculative prefetch tasks from demanded work
4. Stable cache on disk, unstable cache in memory
Symptom:
- good network behavior but still janky rendering
Fix:
- store display-ready images in memory
- stop assuming
URLCacheis enough
5. The list diffing layer restarts everything
Symptom:
- image flicker
- repeated task restarts
- blame unfairly assigned to the image pipeline
Fix:
- fix row identity and state ownership first
11. A production rule worth keeping
If you only keep one rule from this post, keep this one:
Optimize for rendered pixels, not downloaded bytes.
Downloaded bytes matter for bandwidth. Rendered pixels are what decide decode cost, memory pressure, and scroll smoothness.
That framing leads to better decisions:
- variant-aware caching
- downsampling before display
- fewer giant images living in RAM for tiny views
- less pointless work when cells churn
It also stops the classic mistake of “we added caching, why is scrolling still bad?”
Because caching the wrong representation still gives you the wrong bottleneck.
12. The practical baseline I would ship
For most product apps, a sane baseline is:
URLSessionwith normal HTTP caching behavior- a memory cache keyed by
ImageRequest - background downsampling to target pixel size
- in-flight request deduplication
- aggressive cancellation for off-screen work
- lightweight metrics for hit rate, decode time, and cancellations
That setup is not fancy.
It is enough.
And that is the point. Image loading should feel invisible when it is healthy.
If users are noticing it, the bug is rarely that you need a more glamorous framework.
You probably just need to stop doing expensive work at the worst possible moment.