Zero to Photon:
Rendering 106 thumbnails with NSScrollView and Metal
2024 . 3 . 26
Photon can store 20k photos, so Photon Transfer naturally needs to be able to render 20k thumbnails with good scrolling performance.

We can do better though — we'll target rendering a million scrollable thumbnails with the strategy detailed in this post. If we can do that, Photon Transfer won't have any trouble using the same strategy to render 20k thumbnails.


Overview

If you're reading this, you probably know that if you slap 106 CALayers into a scroll view, you're gonna have a bad time.

Instead, to get the scrolling performance we need, we'll use the GPU directly with Metal APIs / CAMetalLayer, use a custom NSScrollView subclass to handle translation / magnification, and employ a few other tricks to efficiently shuttle thumbnail data from disk to the GPU.


GridLayer

Our CAMetalLayer subclass, GridLayer, is tasked with determining which thumbnails are currently visible and arranging for the GPU to render them with Metal.

For efficiency, GridLayer doesn't tell the GPU to render each visible thumbnail individually; rather, it tells the GPU to render groups of thumbnails, where each group consists of a pre-defined number of thumbnails — 2048 at the time of writing. That's because a group of thumbnails can be efficiently passed to the GPU as a so-called texture array.

GridLayer is also responsible for applying the correct translation and scaling transformation to the thumbnails to match the current scroll offset and magnification level of the NSScrollView. This information is provided by AnchoredScrollView, discussed below.

Caching

GridLayer uses an LRU cache to hold on to the most recent thumbnail textures, in order to speed up redrawing. As the user scrolls, new thumbnails come into view and are therefore loaded and placed into the cache, thereby evicting the least-recently-used thumbnail textures.

AnchoredScrollView

AnchoredScrollView is a NSScrollView subclass that tweaks NSScrollView's standard behavior: instead of letting NSScrollView handle translating/scaling its document view, AnchoredScrollView keeps the document view anchored to the visible rect of the NSScrollView and transmits the translation/scaling information to the document, so that the document can handle translation/scaling itself.

Specifically that means AnchoredScrollView supplies the transformation matrix to GridLayer, and GridLayer supplies the transformation matrix to the GPU.

This strategy might raise the question: if we're undoing the translation/magnification behavior of NSScrollView, why use NSScrollView at all?

We still want to use NSScrollView to retain its customary behaviors that users expect on macOS:

  • scroll bar rendering
  • rubber-banding when reaching the limits of the scroll view
  • NSWindow titlebar underlay effects
  • momentum-scrolling when using trackpads / Magic Mouse

Texture Compression

We employ lossy texture compression (ASTC on ARM, BC7 on Intel) to minimize the size of thumbnails on disk. The thumbnails in the example repo are 160 x 90 pixels. Uncompressed, that amounts to 57 KB per thumbnail, while compressed that's 14 KB per thumbnail.

The combined 106 thumbnails are therefore 57 GB uncompressed, versus 14 GB compressed.

Decompression

The compressed image data needs to be decompressed in order to be displayed. GridLayer uses the -replaceRegion: Metal API to perform this decompression, and does so lazily when the thumbnails come into view (assuming the needed textures aren't already in the LRU cache).

mmap

The compressed thumbnail textures are stored on disk as one giant blob of data. For performance and convenience, we mmap this blob into our address space and let the virtual memory subsystem handle shuttling the data between RAM and disk.


Improvements

There are two ways that the scrolling performance could be improved:
  1. Background thumbnail loading

    Instead of decompressing and loading our thumbnail textures on the main thread, we could instead load them on a background thread and display a placeholder until the texture is loaded. This would prevent the stutters that occur when loading data on the main thread.

    This could be further improved by displaying placeholder content that matches the color tone of the thumbnail that's about to appear. (This placeholder color would be stored in the giant texture blob alongside the compressed image data.)

  2. Anticipatory thumbnail loading

    To maximize the chance of a cache hit when requesting thumbnails on the main thread, a background thread could load the thumbnails for regions that are adjacent to the visible region of the scroll view.