Ric, > That all makes sense, but I think it is orthogonal in large part to > the need to get a good way to measure performance. There are two parts to the performance puzzle: 1. How does mixing discards/zeroouts with regular reads and writes affect system performance? 2. How does issuing discards affect the tail latency of the device for a given workload? Is it worth it? Providing tooling for (1) is feasible whereas (2) is highly workload-specific. So unless we can make the cost of (1) negligible, we'll have to defer (2) to the user. > For SCSI, I think the "WRITE_SAME" command *might* do discard > internally or just might end up re-writing large regions of slow, > spinning drives so I think it is less interesting. WRITE SAME has an UNMAP flag that tells the device to deallocate, if possible. The results are deterministic (unlike the UNMAP command). WRITE SAME also has an ANCHOR flag which provides a use case we currently don't have fallocate plumbing for: Allocating blocks without caring about their contents. I.e. the blocks described by the I/O are locked down to prevent ENOSPC for future writes. -- Martin K. Petersen Oracle Linux Engineering