Re: [PATCH] mkfs: use stx_blksize for dev block size by default

Daniel Gomez <da.gomez@xxxxxxxxxx> · Fri, 7 Feb 2025 11:04:06 +0100

On Thu, Feb 06, 2025 at 08:30:22PM +0100, Christoph Hellwig wrote:
> On Thu, Feb 06, 2025 at 07:00:55PM +0000, da.gomez@xxxxxxxxxx wrote:
> > From: Daniel Gomez <da.gomez@xxxxxxxxxxx>
> > 
> > In patch [1] ("bdev: use bdev_io_min() for statx block size"), block
> > devices will now report their preferred minimum I/O size for optimal
> > performance in the stx_blksize field of the statx data structure. This
> > change updates the current default 4 KiB block size for all devices
> > reporting a minimum I/O larger than 4 KiB, opting instead to query for
> > its advertised minimum I/O value in the statx data struct.
> 
> UUuh, no.  Larger block sizes have their use cases, but this will
> regress performance for a lot (most?) common setups.  A lot of
> device report fairly high values there, but say increasing the

Are these devices reporting the correct value? As I mentioned in my discussion
with Darrick, matching the minimum_io_size with the "fs fundamental blocksize"
actually allows to avoid RMW operations (when using the default path in mkfs.xfs
and the value reported is within boundaries).

> directory and bmap btree block size unconditionally using the current
> algorithms will dramatically increase write amplification.  Similarly

I agree, but it seems to be a consequence of using such a large minimum_io_size.

> for small buffered writes.
> 

Exactly. Even though write amplification happens with a 512 byte minimum_io_size
and a 4k default block size, it doesn't incur a performance penalty. But we will
incur that when minimum_io_size exceeds 4k. So, this solves the issue but comes
at the cost of write amplification.