If you've ever deleted a folder Finder claimed was 87 GB and watched your
disk free up 12, you've met the problem this post is about. On APFS, the
"size" of a file and the bytes it costs you on disk are two different
numbers, and they often disagree by an order of magnitude. Here's why,
and how an honest disk tool has to count.
01 The Xcode trap
The first time the problem showed up for us was an
iOS DeviceSupport folder that Finder's Get Info reported as
87 GB. We deleted it, expecting an Xcode-sized weight off our shoulders,
and the disk got 12 GB lighter. We checked the trash, checked the volume,
ran diskutil. The 75 GB gap wasn't a bug. The 87 was just
the wrong number to look at.
Most disk tools on the Mac report that 87. du -sh does, by
default. Finder does. Some of the popular treemap apps do. The number
isn't wrong, exactly, it's just logical: the total of the file
lengths as the apps that wrote them perceive them. What it doesn't
tell you is whether any of those files share storage with anyone else.
Which, on APFS, is the entire question.
02 APFS clones
APFS, the filesystem Apple shipped with High Sierra, supports a thing
Apple calls cloning. When you duplicate a file in Finder, or
copy one with cp -c, or when Xcode populates a new
DeviceSupport bundle from a previous one, APFS doesn't copy any bytes.
It creates a new inode that points at the same disk blocks as the
original. Both files have full read/write semantics; if you change
one, that one diverges and only the differing blocks get new storage
(copy-on-write). Until you do, the new "file" is a metadata entry
and effectively free.
This is fantastic in everyday use. It's why a fresh Xcode install
doesn't double your disk usage and why git clone --reference
stops being meaningful when both repos are on the same APFS volume.
It is, however, terrible for disk visualizers that assume "file size"
means "bytes consumed." A clone-heavy folder will sum to dozens or
hundreds of gigabytes logically while costing the volume only a few.
03 Hardlinks, which are not the same thing
A hardlink is a second name for the same inode. Time Machine uses
them heavily; some package managers do too. From the filesystem's
perspective there is one file and several directory entries pointing
at it. From a disk-visualizer's perspective there are several
files, each apparently full-sized, and an unsuspecting walk
will count every one of them.
The fix is the same as for clones: dedupe by inode. Walk the
directory tree, keep a set of inode numbers you've already counted,
skip the duplicates. Every modern stat call returns
st_ino alongside st_size; the work is
bookkeeping, not analysis. It's just that almost nobody bothers,
because the duplicates only become visible on Time Machine backups
and other backup-shaped folders, and "your backup folder reports
the wrong size" is the kind of bug nobody files.
04 iCloud dataless stubs
iCloud Drive does the inverse problem. When a file is "optimized" out
of local storage, the entry that remains on disk is a stub: a few
kilobytes of metadata that, to the kernel and to most disk-walking
code, still looks like the full file. st_size reports
4 GB; st_blocks reports 16. Sum these naively and
your ~/iCloud Drive folder appears to take up hundreds of
gigabytes while actually costing you nothing.
Worse, the casual fix, "just read the file to figure out how big it
really is," materializes the data, downloading the whole file from
iCloud to answer a question you didn't actually need answered. Some
otherwise-careful tools do this, and the bandwidth bill it generates
on a slow connection is its own kind of disaster.
The right answer is the st_flags field returned by
getattrlist with the right attribute set. There's an
SF_DATALESS flag that means exactly what it sounds like:
the data isn't here, the size on disk is the stub size, don't try to
read it. Honest accounting just checks the flag.
Every entry in the scan goes through three numbers:
Logical size. The value of st_size,
same as Finder's Get Info. Useful for "how much will this take
if I copy it to a non-APFS drive."
On-disk size, before dedup. The value of
st_blocks × 512. What the file would cost
on disk if it weren't sharing storage with anyone.
On-disk size, after dedup. The above, but with
inodes we've already seen contributing zero. This is the number
that actually answers "how much disk space will I get back if I
delete this folder." It's also the number every disk visualizer
should be showing as the default, and most don't.
The treemap uses dedup-aware on-disk size for the picture, with a
toggle for logical size when you want to see "as it would appear if
copied off." The inspector shows both side by side for every
selection, with cloud-stub files tagged so it's obvious when the
logical number is a fiction. Hardlinks are marked too: every inode
gets one full-size tile, and every additional directory entry pointing
at it gets a small ghost tile so you can find the duplicates without
them inflating the picture.
06 Why this matters in practice
Two scenarios where the gap between logical and on-disk size matters
enough to change what you do:
The "I need 50 GB free by Friday" scenario.
You scan the disk, the visualizer points at ~/Library/Developer,
you delete it, you free 8 GB instead of the 47 you expected. The
logical-only tool sent you on a wild goose chase. The dedup-aware
tool would've told you which folders are actually unique and which
are mostly clones of other folders. Two-thirds of your "fat"
directories are usually the second kind.
The "what's actually in my iCloud folder" scenario.
The logical-only tool tells you iCloud is 300 GB. The dedup-aware
tool tells you it's 18, with 282 GB of cloud-only files you
haven't materialized. Knowing which is which decides whether you
spend an afternoon archiving or move on with your life.
07 The takeaway
The reason most Mac disk tools get this wrong isn't that the
APIs are missing; st_blocks has been there forever
and the cloud-stub flag has been there since iCloud Drive shipped.
It's that the right way to count requires a stateful walk
(inode dedup), a per-file branch (cloud stubs), and a UI that
shows two numbers without making the second one feel like a
technical footnote. None of that is hard. It's just work that
very few apps decided to do.
Delve does it. If you want to see your real disk usage rather
than the logical fiction, download it
and point it at a clone-heavy folder. The first time the on-disk
number comes back at a fifth of the logical one, you'll know
exactly which 75 GB of Xcode is lying to you.