On Thu, Jun 20, 2024 at 02:54:09PM +0100, Matthew Wilcox wrote: > On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote: > > On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote: > > > Support fallback to buffered I/O if the operation being performed on > > > unaligned length or offset. This may change the behavior for direct > > > I/O in some cases. > > > > > > [Before] > > > For length which aligned with 256 bytes (not SECTOR aligned) will > > > read failed under direct I/O. > > > > > > [After] > > > For length which aligned with 256 bytes (not SECTOR aligned) will > > > read the data successfully under direct I/O because it will fallback > > > to buffer I/O. > > This is against the O_DIRECT requirements. > > O_DIRECT > The O_DIRECT flag may impose alignment restrictions on the length and > address of user-space buffers and the file offset of I/Os. In Linux > alignment restrictions vary by filesystem and kernel version and might > be absent entirely. The handling of misaligned O_DIRECT I/Os also > varies; they can either fail with EINVAL or fall back to buffered I/O. > > Since Linux 6.1, O_DIRECT support and alignment restrictions for a file > can be queried using statx(2), using the STATX_DIOALIGN flag. Support > for STATX_DIOALIGN varies by filesystem; see statx(2). > > Some filesystems provide their own interfaces for querying O_DIRECT > alignment restrictions, for example the XFS_IOC_DIOINFO operation in xf‐ > sctl(3). STATX_DIOALIGN should be used instead when it is available. > > If none of the above is available, then direct I/O support and alignment > restrictions can only be assumed from known characteristics of the > filesystem, the individual file, the underlying storage device(s), and > the kernel version. In Linux 2.4, most filesystems based on block de‐ > vices require that the file offset and the length and memory address of > all I/O segments be multiples of the filesystem block size (typically > 4096 bytes). In Linux 2.6.0, this was relaxed to the logical block size > of the block device (typically 512 bytes). A block device's logical > block size can be determined using the ioctl(2) BLKSSZGET operation or > from the shell using the command: That's really just descriptive, not prescriptive. The intent of O_DIRECT is "bypass the page cache", the alignment restrictions are just a side effect of that. Applications just care about is having predictable performance characteristics. > > The catch is that struct bio - bvec_iter - represents addresses with a > > sector_t, and we'd want that to be a loff_t. > > > > That's something we should do anyways; everything else in struct bio can > > represent a byte-aligned io, bvec_iter.bi_sector is the only exception > > and fixing that would help in consolidating our various scatter-gather > > list data structures - but we'd need buy-in from Jens and Christoph > > before doing that. > > I'm against it. Block devices only do sector-aligned IO and we should > not pretend otherwise. Eh? bio isn't really specific to the block layer anyways, given that an iov_iter can be a bio underneath. We _really_ should be trying for better commonality of data structures.