On Wed, Aug 16, 2017 at 09:49:22PM -0400, Martin K. Petersen wrote: > e standards tweaked the definitions a bit so the semantics became > even more confusing and harder to honor in the drivers. > > As a result, we changed things so that discards are only used to > de-provision blocks. And the zeroout call/ioctl is used to zero block > ranges. > > Which ATA/SCSI/NVMe command is issued on the back-end depends on what's > supported by the device and is hidden from the caller. > > However, zeroout is guaranteed to return a zeroed block range on > subsequent reads. The blocks may be unmapped, anchored, written > explicitly, written with write same, or a combination thereof. But you > are guaranteed predictable results. > > Whereas a discarded region may be sliced and diced and rounded off > before it hits the device. Which is then free to ignore all or parts of > the request. > > Consequently, discard_zeroes_data is meaningless. Because there is no > guarantee that all of the discarded blocks will be acted upon. It > kinda-sorta sometimes worked (if the device was whitelisted, had a > reported alignment of 0, a granularity of 512 bytes, stacking didn't get > in the way, and you were lucky on the device end). But there were always > conditions. Thanks for the detailed explanation. That's wery usefull to know! > > So taking a step back: What information specifically were you trying to > obtain from querying that flag? And why do you need it? There are many users that historically benefit from the "discard_zeroes_data" semantics. For example mkfs, where it's beneficial to discard the blocks before creating a file system and if we also get deterministic zeroes on read, even better since we do not have to initialize some portions of the file system manually. The other example might be virtualization where they can support efficient "Wipe After Delete" and "Enable Discard" in case that "discard_zeroes_data". I am sure there are other examples. So I understand now that Deterministic Read Zero after TRIM is not realiable so we do not want to use that flag because we can't guarantee it in this case. However there are other situations where we can such as loop device (might be especially usefull for VM) where backing file system supports punch hole, or even SCSI write same with UNMAP ? Currently user space can call fallocate with FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE however if that succeeds we're only guaranteed that the range has been zeroed, not unmapped/discarded ? (that's not very clear from the comments). None of the modes seems to guarantee both zeroout and unmap in case of success. However still, there seem to be no way to tell what's actually supported from user space without ending up calling fallocate, is there ? While before we had discard_zeroes_data which people learned to rely on in certain situations, even though it might have been shaky. I actually like the rewrite the Christoph did, even though documentation seems to be lacking. But I just wonder if it's possible to bring back the former functionality, at least in some form. Thanks! -Lukas > > -- > Martin K. Petersen Oracle Linux Engineering