On Thu, Jun 16, 2022 at 01:14:59PM -0700, Eric Biggers wrote: > From: Eric Biggers <ebiggers@xxxxxxxxxx> > > Traditionally, the conditions for when DIO (direct I/O) is supported > were fairly simple. For both block devices and regular files, DIO had > to be aligned to the logical block size of the block device. > > However, due to filesystem features that have been added over time (e.g. > multi-device support, data journalling, inline data, encryption, verity, > compression, checkpoint disabling, log-structured mode), the conditions > for when DIO is allowed on a regular file have gotten increasingly > complex. Whether a particular regular file supports DIO, and with what > alignment, can depend on various file attributes and filesystem mount > options, as well as which block device(s) the file's data is located on. > > Moreover, the general rule of DIO needing to be aligned to the block > device's logical block size is being relaxed to allow user buffers (but > not file offsets) aligned to the DMA alignment instead > (https://lore.kernel.org/linux-block/20220610195830.3574005-1-kbusch@xxxxxx/T/#u). > > XFS has an ioctl XFS_IOC_DIOINFO that exposes DIO alignment information. > Uplifting this to the VFS is one possibility. However, as discussed > (https://lore.kernel.org/linux-fsdevel/20220120071215.123274-1-ebiggers@xxxxxxxxxx/T/#u), > this ioctl is rarely used and not known to be used outside of > XFS-specific code. It was also never intended to indicate when a file > doesn't support DIO at all, nor was it intended for block devices. > > Therefore, let's expose this information via statx(). Add the > STATX_DIOALIGN flag and two new statx fields associated with it: > > * stx_dio_mem_align: the alignment (in bytes) required for user memory > buffers for DIO, or 0 if DIO is not supported on the file. > > * stx_dio_offset_align: the alignment (in bytes) required for file > offsets and I/O segment lengths for DIO, or 0 if DIO is not supported > on the file. This will only be nonzero if stx_dio_mem_align is > nonzero, and vice versa. > > Note that as with other statx() extensions, if STATX_DIOALIGN isn't set > in the returned statx struct, then these new fields won't be filled in. > This will happen if the file is neither a regular file nor a block > device, or if the file is a regular file and the filesystem doesn't > support STATX_DIOALIGN. It might also happen if the caller didn't > include STATX_DIOALIGN in the request mask, since statx() isn't required > to return unrequested information. > > This commit only adds the VFS-level plumbing for STATX_DIOALIGN. For > regular files, individual filesystems will still need to add code to > support it. For block devices, a separate commit will wire it up too. > > Signed-off-by: Eric Biggers <ebiggers@xxxxxxxxxx> > --- > fs/stat.c | 2 ++ > include/linux/stat.h | 2 ++ > include/uapi/linux/stat.h | 4 +++- > 3 files changed, 7 insertions(+), 1 deletion(-) > > diff --git a/fs/stat.c b/fs/stat.c > index 9ced8860e0f35..a7930d7444830 100644 > --- a/fs/stat.c > +++ b/fs/stat.c > @@ -611,6 +611,8 @@ cp_statx(const struct kstat *stat, struct statx __user *buffer) > tmp.stx_dev_major = MAJOR(stat->dev); > tmp.stx_dev_minor = MINOR(stat->dev); > tmp.stx_mnt_id = stat->mnt_id; > + tmp.stx_dio_mem_align = stat->dio_mem_align; > + tmp.stx_dio_offset_align = stat->dio_offset_align; > > return copy_to_user(buffer, &tmp, sizeof(tmp)) ? -EFAULT : 0; > } > diff --git a/include/linux/stat.h b/include/linux/stat.h > index 7df06931f25d8..ff277ced50e9f 100644 > --- a/include/linux/stat.h > +++ b/include/linux/stat.h > @@ -50,6 +50,8 @@ struct kstat { > struct timespec64 btime; /* File creation time */ > u64 blocks; > u64 mnt_id; > + u32 dio_mem_align; > + u32 dio_offset_align; Hmm. Does the XFS port of XFS_IOC_DIOINFO to STATX_DIOALIGN look like this? struct xfs_buftarg *target = xfs_inode_buftarg(ip); kstat.dio_mem_align = target->bt_logical_sectorsize; kstat.dio_offset_align = target->bt_logical_sectorsize; kstat.result_mask |= STATX_DIOALIGN; And I guess you're tabling the "optimal" IO discussions for now, because there are too many variants of what that means? --D > }; > > #endif > diff --git a/include/uapi/linux/stat.h b/include/uapi/linux/stat.h > index 1500a0f58041a..7cab2c65d3d7f 100644 > --- a/include/uapi/linux/stat.h > +++ b/include/uapi/linux/stat.h > @@ -124,7 +124,8 @@ struct statx { > __u32 stx_dev_minor; > /* 0x90 */ > __u64 stx_mnt_id; > - __u64 __spare2; > + __u32 stx_dio_mem_align; /* Memory buffer alignment for direct I/O */ > + __u32 stx_dio_offset_align; /* File offset alignment for direct I/O */ > /* 0xa0 */ > __u64 __spare3[12]; /* Spare space for future expansion */ > /* 0x100 */ > @@ -152,6 +153,7 @@ struct statx { > #define STATX_BASIC_STATS 0x000007ffU /* The stuff in the normal stat struct */ > #define STATX_BTIME 0x00000800U /* Want/got stx_btime */ > #define STATX_MNT_ID 0x00001000U /* Got stx_mnt_id */ > +#define STATX_DIOALIGN 0x00002000U /* Want/got direct I/O alignment info */ > > #define STATX__RESERVED 0x80000000U /* Reserved for future struct statx expansion */ > > -- > 2.36.1 >