On Fri, Jul 22, 2022 at 08:43:55AM -0600, Keith Busch wrote: > On Fri, Jul 22, 2022 at 12:36:01AM -0700, Eric Biggers wrote: > > [+f2fs list and maintainers] > > > > On Fri, Jun 10, 2022 at 12:58:30PM -0700, Keith Busch wrote: > > > From: Keith Busch <kbusch@xxxxxxxxxx> > > > > > > Use the address alignment requirements from the block_device for direct > > > io instead of requiring addresses be aligned to the block size. > > > > > > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > > > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > > > --- > > > fs/iomap/direct-io.c | 4 ++-- > > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c > > > index 370c3241618a..5d098adba443 100644 > > > --- a/fs/iomap/direct-io.c > > > +++ b/fs/iomap/direct-io.c > > > @@ -242,7 +242,6 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > struct inode *inode = iter->inode; > > > unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev)); > > > unsigned int fs_block_size = i_blocksize(inode), pad; > > > - unsigned int align = iov_iter_alignment(dio->submit.iter); > > > loff_t length = iomap_length(iter); > > > loff_t pos = iter->pos; > > > unsigned int bio_opf; > > > @@ -253,7 +252,8 @@ static loff_t iomap_dio_bio_iter(const struct iomap_iter *iter, > > > size_t copied = 0; > > > size_t orig_count; > > > > > > - if ((pos | length | align) & ((1 << blkbits) - 1)) > > > + if ((pos | length) & ((1 << blkbits) - 1) || > > > + !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) > > > return -EINVAL; > > > > > > if (iomap->type == IOMAP_UNWRITTEN) { > > > > I noticed that this patch is going to break the following logic in > > f2fs_should_use_dio() in fs/f2fs/file.c: > > > > /* > > * Direct I/O not aligned to the disk's logical_block_size will be > > * attempted, but will fail with -EINVAL. > > * > > * f2fs additionally requires that direct I/O be aligned to the > > * filesystem block size, which is often a stricter requirement. > > * However, f2fs traditionally falls back to buffered I/O on requests > > * that are logical_block_size-aligned but not fs-block aligned. > > * > > * The below logic implements this behavior. > > */ > > align = iocb->ki_pos | iov_iter_alignment(iter); > > if (!IS_ALIGNED(align, i_blocksize(inode)) && > > IS_ALIGNED(align, bdev_logical_block_size(inode->i_sb->s_bdev))) > > return false; > > > > return true; > > > > So, f2fs assumes that __iomap_dio_rw() returns an error if the I/O isn't logical > > block aligned. This patch changes that. The result is that DIO will sometimes > > proceed in cases where the I/O doesn't have the fs block alignment required by > > f2fs for all DIO. > > > > Does anyone have any thoughts about what f2fs should be doing here? I think > > it's weird that f2fs has different behaviors for different degrees of > > misalignment: fail with EINVAL if not logical block aligned, else fallback to > > buffered I/O if not fs block aligned. I think it should be one convention or > > the other. Any opinions about which one it should be? > > It looks like f2fs just falls back to buffered IO for this condition without > reaching the new code in iomap_dio_bio_iter(). No. It's a bit subtle, so read the code and what I'm saying carefully. f2fs only supports 4K aligned DIO and normally falls back to buffered I/O; however, for DIO that is *very* misaligned (not even LBS aligned) it returns EINVAL instead. And it relies on __iomap_dio_rw() returning that EINVAL. Relying on __iomap_dio_rw() in that way is definitely a bad design on f2fs's part (and I messed that up when switching f2fs from fs/direct-io.c to iomap). The obvious fix is to just have f2fs do the LBS alignment check itself. But I think that f2fs shouldn't have different behavior for different levels of misalignment in the first place, so I was wondering if anyone had any thoughts on which behavior (EINVAL or fallback to buffered I/O) should be standardized on in all cases, at least for f2fs. There was some discussion about this sort of thing for ext4 several years ago on the thread https://lore.kernel.org/linux-ext4/1461472078-20104-1-git-send-email-tytso@xxxxxxx/T/#u, but it didn't really reach a conclusion. I'm wondering if the f2fs maintainers have any thoughts about why the f2fs behavior is as it is. I.e. is it just accidental, or are there specific reasons... - Eric