On Wed, May 08, 2019 at 01:09:03PM -0400, Ric Wheeler wrote: > > On 5/8/19 1:03 PM, Martin K. Petersen wrote: > > Ric, > > > > > That all makes sense, but I think it is orthogonal in large part to > > > the need to get a good way to measure performance. > > There are two parts to the performance puzzle: > > > > 1. How does mixing discards/zeroouts with regular reads and writes > > affect system performance? > > > > 2. How does issuing discards affect the tail latency of the device for > > a given workload? Is it worth it? > > > > Providing tooling for (1) is feasible whereas (2) is highly > > workload-specific. So unless we can make the cost of (1) negligible, > > we'll have to defer (2) to the user. > > Agree, but I think that there is also a base level performance question - > how does the discard/zero perform by itself. > > Specifically, we have had to punt the discard of a whole block device before > mkfs (back at RH) since it tripped up a significant number of devices. > Similar pain for small discards (say one fs page) - is it too slow to do? Small discards are already skipped is the device indicates it has a minumum discard granularity. This is another reason why the "-o discard" mount option isn't sufficient by itself and fstrim is still required - filesystems often only free small isolated chunks of space at a time and hence never may send discards to the device. > > > For SCSI, I think the "WRITE_SAME" command *might* do discard > > > internally or just might end up re-writing large regions of slow, > > > spinning drives so I think it is less interesting. > > WRITE SAME has an UNMAP flag that tells the device to deallocate, if > > possible. The results are deterministic (unlike the UNMAP command). That's kinda what I'm getting at here - we need to define the behaviour the OS provides users, and then ensure that the behaviour is standardised correctly so that devices behave correctly. i.e. we want devices to support WRITE_SAME w/ UNMAP flag well (because that's an exact representation of FALLOC_FL_PUNCH_HOLE requirements), and don't really care about the UNMAP command. > > WRITE SAME also has an ANCHOR flag which provides a use case we > > currently don't have fallocate plumbing for: Allocating blocks without > > caring about their contents. I.e. the blocks described by the I/O are > > locked down to prevent ENOSPC for future writes. So WRITE_SAME (0) with an ANCHOR flag does not return zeroes on subsequent reads? i.e. it is effectively fallocate(FALLOC_FL_NO_HIDE_STALE) preallocation semantics? For many use cases cases we actually want zeroed space to be guaranteed so we don't expose stale data from previous device use into the new user's visibility - can that be done with WRITE_SAME and the ANCHOR flag? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx