On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote: > On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote: > > Support fallback to buffered I/O if the operation being performed on > > unaligned length or offset. This may change the behavior for direct > > I/O in some cases. > > > > [Before] > > For length which aligned with 256 bytes (not SECTOR aligned) will > > read failed under direct I/O. > > > > [After] > > For length which aligned with 256 bytes (not SECTOR aligned) will > > read the data successfully under direct I/O because it will fallback > > to buffer I/O. This is against the O_DIRECT requirements. O_DIRECT The O_DIRECT flag may impose alignment restrictions on the length and address of user-space buffers and the file offset of I/Os. In Linux alignment restrictions vary by filesystem and kernel version and might be absent entirely. The handling of misaligned O_DIRECT I/Os also varies; they can either fail with EINVAL or fall back to buffered I/O. Since Linux 6.1, O_DIRECT support and alignment restrictions for a file can be queried using statx(2), using the STATX_DIOALIGN flag. Support for STATX_DIOALIGN varies by filesystem; see statx(2). Some filesystems provide their own interfaces for querying O_DIRECT alignment restrictions, for example the XFS_IOC_DIOINFO operation in xf‐ sctl(3). STATX_DIOALIGN should be used instead when it is available. If none of the above is available, then direct I/O support and alignment restrictions can only be assumed from known characteristics of the filesystem, the individual file, the underlying storage device(s), and the kernel version. In Linux 2.4, most filesystems based on block de‐ vices require that the file offset and the length and memory address of all I/O segments be multiples of the filesystem block size (typically 4096 bytes). In Linux 2.6.0, this was relaxed to the logical block size of the block device (typically 512 bytes). A block device's logical block size can be determined using the ioctl(2) BLKSSZGET operation or from the shell using the command: blockdev --getss > The catch is that struct bio - bvec_iter - represents addresses with a > sector_t, and we'd want that to be a loff_t. > > That's something we should do anyways; everything else in struct bio can > represent a byte-aligned io, bvec_iter.bi_sector is the only exception > and fixing that would help in consolidating our various scatter-gather > list data structures - but we'd need buy-in from Jens and Christoph > before doing that. I'm against it. Block devices only do sector-aligned IO and we should not pretend otherwise.