Re: bvec_iter.bi_sector -> loff_t? (was: Re: [PATCH] bcachefs: allow direct io fallback to buffer io for) unaligned length or offset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 20, 2024 at 02:54:09PM +0100, Matthew Wilcox wrote:
> On Thu, Jun 20, 2024 at 09:36:42AM -0400, Kent Overstreet wrote:
> > On Thu, Jun 20, 2024 at 09:21:57PM +0800, Hongbo Li wrote:
> > > Support fallback to buffered I/O if the operation being performed on
> > > unaligned length or offset. This may change the behavior for direct
> > > I/O in some cases.
> > > 
> > > [Before]
> > > For length which aligned with 256 bytes (not SECTOR aligned) will
> > > read failed under direct I/O.
> > > 
> > > [After]
> > > For length which aligned with 256 bytes (not SECTOR aligned) will
> > > read the data successfully under direct I/O because it will fallback
> > > to buffer I/O.
> 
> This is against the O_DIRECT requirements.
> 
>    O_DIRECT
>        The O_DIRECT flag may impose alignment restrictions on  the  length  and
>        address  of  user-space  buffers  and the file offset of I/Os.  In Linux
>        alignment restrictions vary by filesystem and kernel version  and  might
>        be  absent  entirely.   The  handling  of  misaligned O_DIRECT I/Os also
>        varies; they can either fail with EINVAL or fall back to buffered I/O.
> 
>        Since Linux 6.1, O_DIRECT support and alignment restrictions for a  file
>        can  be  queried using statx(2), using the STATX_DIOALIGN flag.  Support
>        for STATX_DIOALIGN varies by filesystem; see statx(2).
> 
>        Some filesystems provide their  own  interfaces  for  querying  O_DIRECT
>        alignment restrictions, for example the XFS_IOC_DIOINFO operation in xf‐
>        sctl(3).  STATX_DIOALIGN should be used instead when it is available.
> 
>        If none of the above is available, then direct I/O support and alignment
>        restrictions  can  only  be  assumed  from  known characteristics of the
>        filesystem, the individual file, the underlying storage  device(s),  and
>        the  kernel  version.  In Linux 2.4, most filesystems based on block de‐
>        vices require that the file offset and the length and memory address  of
>        all  I/O  segments  be multiples of the filesystem block size (typically
>        4096 bytes).  In Linux 2.6.0, this was relaxed to the logical block size
>        of the block device (typically 512 bytes).   A  block  device's  logical
>        block  size  can be determined using the ioctl(2) BLKSSZGET operation or
>        from the shell using the command:

That's really just descriptive, not prescriptive.

The intent of O_DIRECT is "bypass the page cache", the alignment
restrictions are just a side effect of that. Applications just care
about is having predictable performance characteristics.

> > The catch is that struct bio - bvec_iter - represents addresses with a
> > sector_t, and we'd want that to be a loff_t.
> > 
> > That's something we should do anyways; everything else in struct bio can
> > represent a byte-aligned io, bvec_iter.bi_sector is the only exception
> > and fixing that would help in consolidating our various scatter-gather
> > list data structures - but we'd need buy-in from Jens and Christoph
> > before doing that.
> 
> I'm against it.  Block devices only do sector-aligned IO and we should
> not pretend otherwise.

Eh?

bio isn't really specific to the block layer anyways, given that an
iov_iter can be a bio underneath. We _really_ should be trying for
better commonality of data structures.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux