On Thu, May 19, 2022 at 04:06:05PM -0700, Darrick J. Wong wrote: > On Wed, May 18, 2022 at 04:50:05PM -0700, Eric Biggers wrote: > > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > > > Traditionally, the conditions for when DIO (direct I/O) is supported > > were fairly simple: filesystems either supported DIO aligned to the > > block device's logical block size, or didn't support DIO at all. > > > > However, due to filesystem features that have been added over time (e.g, > > data journalling, inline data, encryption, verity, compression, > > checkpoint disabling, log-structured mode), the conditions for when DIO > > is allowed on a file have gotten increasingly complex. Whether a > > particular file supports DIO, and with what alignment, can depend on > > various file attributes and filesystem mount options, as well as which > > block device(s) the file's data is located on. > > > > XFS has an ioctl XFS_IOC_DIOINFO which exposes this information to > > applications. However, as discussed > > (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@xxxxxxxxxx/T/#u), > > this ioctl is rarely used and not known to be used outside of > > XFS-specific code. It also was never intended to indicate when a file > > doesn't support DIO at all, and it only exposes the minimum I/O > > alignment, not the optimal I/O alignment which has been requested too. > > > > Therefore, let's expose this information via statx(). Add the > > STATX_IOALIGN flag and three fields associated with it: > > > > * stx_mem_align_dio: the alignment (in bytes) required for user memory > > buffers for DIO, or 0 if DIO is not supported on the file. > > > > * stx_offset_align_dio: the alignment (in bytes) required for file > > offsets and I/O segment lengths for DIO, or 0 if DIO is not supported > > on the file. This will only be nonzero if stx_mem_align_dio is > > nonzero, and vice versa. > > > > * stx_offset_align_optimal: the alignment (in bytes) suggested for file > > offsets and I/O segment lengths to get optimal performance. This > > applies to both DIO and buffered I/O. It differs from stx_blocksize > > in that stx_offset_align_optimal will contain the real optimum I/O > > size, which may be a large value. In contrast, for compatibility > > reasons stx_blocksize is the minimum size needed to avoid page cache > > read/write/modify cycles, which may be much smaller than the optimum > > I/O size. For more details about the motivation for this field, see > > https://lore.kernel.org/r/20220210040304.GM59729@xxxxxxxxxxxxxxxxxxx > > Hmm. So I guess this is supposed to be the filesystem's best guess at > the IO size that will minimize RMW cycles in the entire stack? i.e. if > the user does not want RMW of pagecache pages, of file allocation units > (if COW is enabled), of RAID stripes, or in the storage itself, then it > should ensure that all IOs are aligned to this value? > > I guess that means for XFS it's effectively max(pagesize, i_blocksize, > bdev io_opt, sb_width, and (pretend XFS can reflink the realtime volume) > the rt extent size)? I didn't see a manpage update for statx(2) but > that's mostly what I'm interested in. :) Yup, xfs_stat_blksize() should give a good idea of what we should do. It will end up being pretty much that, except without the need to a mount option to turn on the sunit/swidth return, and always taking into consideration extent size hints rather than just doing that for RT inodes... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx