Re: [LSF/MM TOPIC] More async operations for file systems - async discard?

"Martin K. Petersen" <martin.petersen@xxxxxxxxxx> · Thu, 21 Feb 2019 22:01:24 -0500

Jeff,

> We've always been told "don't worry about what the internal block size
> is, that only matters to the FTL."  That's obviously not true, but
> when devices only report a 512 byte granularity, we believe them and
> will issue discard for the smallest size that makes sense for the file
> system regardless of whether it makes sense (internally) for the SSD.
> That means 4k for pretty much anything except btrfs metadata nodes,
> which are 16k.

The devices are free to report a bigger discard granularity. We already
support and honor that (for SCSI, anyway). It's completely orthogonal to
reported the logical block size, although it obviously needs to be a
multiple.

The real problem is that vendors have zero interest in optimizing for
discard. They are so confident in their FTL and overprovisioning that
they don't view it as an important feature. At all.

Consequently, many of the modern devices that claim to support discard
to make us software folks happy (or to satisfy a purchase order
requirements) complete the commands without doing anything at all.
We're simply wasting queue slots.

Personally, I think discard is dead on anything but the cheapest
devices.  And on those it is probably going to be
performance-prohibitive to use it in any other way than a weekly fstrim.

-- 
Martin K. Petersen	Oracle Linux Engineering