On 11/22/19 3:30 PM, Eric Sandeen wrote: > On 11/22/19 3:10 PM, Eric Sandeen wrote: >> On 11/21/19 5:18 PM, Dave Chinner wrote: >>> On Thu, Nov 21, 2019 at 10:44:44PM +0100, Pavel Reichl wrote: >>>> Signed-off-by: Pavel Reichl <preichl@xxxxxxxxxx> >>>> --- >>> >>> This is mixing an explanation about why the change is being made >>> and what was considered when making decisions about the change. >>> >>> e.g. my first questions on looking at the patch were: >>> >>> - why do we need to break up the discards into 2GB chunks? >>> - why 2GB? >>> - why not use libblkid to query the maximum discard size >>> and use that as the step size instead? >> >> Just wondering, can we trust that to be reasonably performant? >> (the whole motivation here is for hardware that takes inordinately >> long to do discard, I wonder if we can count on such hardware to >> properly fill out this info....) > > Looking at the docs in kernel/Documentation/block/queue-sysfs.rst: > > discard_max_hw_bytes (RO) > ------------------------- > Devices that support discard functionality may have internal limits on > the number of bytes that can be trimmed or unmapped in a single operation. > The discard_max_bytes parameter is set by the device driver to the maximum > number of bytes that can be discarded in a single operation. Discard > requests issued to the device must not exceed this limit. A discard_max_bytes > value of 0 means that the device does not support discard functionality. > > discard_max_bytes (RW) > ---------------------- > While discard_max_hw_bytes is the hardware limit for the device, this > setting is the software limit. Some devices exhibit large latencies when > large discards are issued, setting this value lower will make Linux issue > smaller discards and potentially help reduce latencies induced by large > discard operations. > > it seems like a strong suggestion that the discard_max_hw_bytes value may > still be problematic, and discard_max_bytes can be hand-tuned to something > smaller if it's a problem. To me that indicates that discard_max_hw_bytes > probably can't be trusted to be performant, and presumably discard_max_bytes > won't be either in that case unless it's been hand-tuned by the admin? Lukas, Jeff Moyer reminded me that you did a lot of investigation into this behavior a while back. Can you shed light on this, particularly how you chose 2G as the discard granularity for mke2fs? Thanks, -Eric