On Tue, Feb 08, 2022 at 05:10:03PM -0800, Eric Biggers wrote: > On Mon, Jan 24, 2022 at 10:03:32AM +1100, Dave Chinner wrote: > > > > > > /* 0xa0 */ > > > > > > /* File range alignment needed for best performance, in bytes. */ > > > __u32 stx_dio_fpos_align_opt; > > > > This is a common property of both DIO and buffered IO, so no need > > for it to be dio-only property. > > > > __u32 stx_offset_align_optimal; > > > > Looking at this more closely: will stx_offset_align_optimal actually be useful, > given that st[x]_blksize already exists? Yes, because.... > From the stat(2) and statx(2) man pages: > > st_blksize > This field gives the "preferred" block size for efficient > filesystem I/O. > > stx_blksize > The "preferred" block size for efficient filesystem I/O. (Writ‐ > ing to a file in smaller chunks may cause an inefficient read- > modify-rewrite.) ... historically speaking, this is intended to avoid RMW cycles for sub-block and/or sub-PAGE_SIZE write() IOs. i.e. the practical definition of st_blksize is the *minimum* IO size the needed to avoid page cache RMW cycles. However, XFS has a "-o largeio" mount option, that sets this value to internal optimal filesytsem alignment values such as stripe unit or even stripe width (-o largeio,swalloc). THis means it can be up to 2GB (maybe larger?) in size. THe problem with this is that many applications are not prepared to see a value of, say, 16MB in st_blksize rather than 4096 bytes. An example of such problems are applications sizing their IO buffers as a multiple of st_blksize - we've had applications fail because they try to use multi-GB sized IO buffers as a result of setting st_blksize to the filesystem/storage idea of optimal IO size rather than PAGE_SIZE. Hence, we can't really change the value of st_blksize without risking random breakage in userspace. hence the practical definition of st_blksize is the *minimum* IO size that avoids RMW cycles for an individual write() syscall, not the most efficient IO size. > File offsets aren't explicitly mentioned, but I think it's implied they should > be a multiple of st[x]_blksize, just like the I/O size. Otherwise, the I/O > would obviously require reading/writing partial blocks. Of course it implies aligned file offsets - block aligned IO is absolutely necessary for effcient filesystem IO. It has for pretty much the entire of unix history... > So, the proposed stx_offset_align_optimal field sounds like the same thing to > me. Is there anything I'm misunderstanding? > > Putting stx_offset_align_optimal behind the STATX_DIRECTIO flag would also be > confusing if it would apply to both direct and buffered I/O. So just name the flag STATX_IOALIGN so that it can cover generic, buffered specific and DIO specific parameters in one hit. Simple, yes? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx