Build notes ·

Counting bytes on APFS.

If you've ever deleted a folder Finder claimed was 87 GB and watched your disk free up 12, you've met the problem this post is about. On APFS, the "size" of a file and the bytes it costs you on disk are two different numbers, and they often disagree by an order of magnitude. Here's why, and how an honest disk tool has to count.

01 The Xcode trap

The first time the problem showed up for us was an iOS DeviceSupport folder that Finder's Get Info reported as 87 GB. We deleted it, expecting an Xcode-sized weight off our shoulders, and the disk got 12 GB lighter. We checked the trash, checked the volume, ran diskutil. The 75 GB gap wasn't a bug. The 87 was just the wrong number to look at.

Most disk tools on the Mac report that 87. du -sh does, by default. Finder does. Some of the popular treemap apps do. The number isn't wrong, exactly, it's just logical: the total of the file lengths as the apps that wrote them perceive them. What it doesn't tell you is whether any of those files share storage with anyone else. Which, on APFS, is the entire question.

02 APFS clones

APFS, the filesystem Apple shipped with High Sierra, supports a thing Apple calls cloning. When you duplicate a file in Finder, or copy one with cp -c, or when Xcode populates a new DeviceSupport bundle from a previous one, APFS doesn't copy any bytes. It creates a new inode that points at the same disk blocks as the original. Both files have full read/write semantics; if you change one, that one diverges and only the differing blocks get new storage (copy-on-write). Until you do, the new "file" is a metadata entry and effectively free.

This is fantastic in everyday use. It's why a fresh Xcode install doesn't double your disk usage and why git clone --reference stops being meaningful when both repos are on the same APFS volume. It is, however, terrible for disk visualizers that assume "file size" means "bytes consumed." A clone-heavy folder will sum to dozens or hundreds of gigabytes logically while costing the volume only a few.

04 iCloud dataless stubs

iCloud Drive does the inverse problem. When a file is "optimized" out of local storage, the entry that remains on disk is a stub: a few kilobytes of metadata that, to the kernel and to most disk-walking code, still looks like the full file. st_size reports 4 GB; st_blocks reports 16. Sum these naively and your ~/iCloud Drive folder appears to take up hundreds of gigabytes while actually costing you nothing.

Worse, the casual fix, "just read the file to figure out how big it really is," materializes the data, downloading the whole file from iCloud to answer a question you didn't actually need answered. Some otherwise-careful tools do this, and the bandwidth bill it generates on a slow connection is its own kind of disaster.

The right answer is the st_flags field returned by getattrlist with the right attribute set. There's an SF_DATALESS flag that means exactly what it sounds like: the data isn't here, the size on disk is the stub size, don't try to read it. Honest accounting just checks the flag.

05 How Delve counts

Every entry in the scan goes through three numbers:

The treemap uses dedup-aware on-disk size for the picture, with a toggle for logical size when you want to see "as it would appear if copied off." The inspector shows both side by side for every selection, with cloud-stub files tagged so it's obvious when the logical number is a fiction. Hardlinks are marked too: every inode gets one full-size tile, and every additional directory entry pointing at it gets a small ghost tile so you can find the duplicates without them inflating the picture.

06 Why this matters in practice

Two scenarios where the gap between logical and on-disk size matters enough to change what you do:

The "I need 50 GB free by Friday" scenario. You scan the disk, the visualizer points at ~/Library/Developer, you delete it, you free 8 GB instead of the 47 you expected. The logical-only tool sent you on a wild goose chase. The dedup-aware tool would've told you which folders are actually unique and which are mostly clones of other folders. Two-thirds of your "fat" directories are usually the second kind.

The "what's actually in my iCloud folder" scenario. The logical-only tool tells you iCloud is 300 GB. The dedup-aware tool tells you it's 18, with 282 GB of cloud-only files you haven't materialized. Knowing which is which decides whether you spend an afternoon archiving or move on with your life.

07 The takeaway

The reason most Mac disk tools get this wrong isn't that the APIs are missing; st_blocks has been there forever and the cloud-stub flag has been there since iCloud Drive shipped. It's that the right way to count requires a stateful walk (inode dedup), a per-file branch (cloud stubs), and a UI that shows two numbers without making the second one feel like a technical footnote. None of that is hard. It's just work that very few apps decided to do.

Delve does it. If you want to see your real disk usage rather than the logical fiction, download it and point it at a clone-heavy folder. The first time the on-disk number comes back at a fifth of the logical one, you'll know exactly which 75 GB of Xcode is lying to you.

← All build notes