Re: [PATCH] xfs: allocate sector sized IO buffer via page_frag_alloc

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 28 Feb 2019 08:38:39 +1100

On Tue, Feb 26, 2019 at 08:14:33AM -0800, Darrick J. Wong wrote:
> On Tue, Feb 26, 2019 at 06:04:40AM -0800, Matthew Wilcox wrote:
> > On Tue, Feb 26, 2019 at 09:42:48PM +0800, Ming Lei wrote:
> > > On Tue, Feb 26, 2019 at 05:02:30AM -0800, Matthew Wilcox wrote:
> > > > Wait, we're imposing a ridiculous amount of complexity on XFS for no
> > > > reason at all?  We should just change this to 512-byte alignment.  Tying
> > > > it to the blocksize of the device never made any sense.
> > > 
> > > OK, that is fine since we can fallback to buffered IO for loop in case of
> > > unaligned dio.
> > > 
> > > Then something like the following patch should work for all fs, could
> > > anyone comment on this approach?
> > 
> > That's not even close to what I meant.
> > 
> > diff --git a/fs/direct-io.c b/fs/direct-io.c
> > index ec2fb6fe6d37..dee1fc47a7fc 100644
> > --- a/fs/direct-io.c
> > +++ b/fs/direct-io.c
> > @@ -1185,18 +1185,20 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
> 
> Wait a minute, are you all saying that /directio/ is broken on XFS too??
> XFS doesn't use blockdev_direct_IO anymore.

No, loop devices w/ direct IO is a complete red herring. It's the
same issue - the upper filesystem is sending down unaligned bios
to the loop device, which is then just passing them on to the
underlying filesystem via DIO, and the DIO code in the lower
filesystem saying "that user memory buffer ain't aligned to my
logical sector size" and rejecting it.

Actually, in XFS's case, it doesn't care about memory buffer
alignment - it's the iomap code that is rejecting it when mapping
the memory buffer to a bio in iomap_dio_bio_actor():

	unsigned int blkbits = blksize_bits(bdev_logical_block_size(iomap->bdev));
.....
	unsigned int align = iov_iter_alignment(dio->submit.iter);
.....
	if ((pos | length | align) & ((1 << blkbits) - 1))
		return -EINVAL;

IOWs, if the memory buffer is not aligned to the logical block size
of the underlying device (which defaults to 512 bytes) then it will
be rejected by the lower filesystem...

> I thought we were talking about alignment of XFS metadata buffers
> (xfs_buf.c), which is a very different topic.

Yup, it's the same problem, just a different symptom. Fix the
unaligned buffer problem, we fix the loop dev w/ direct-io problem,
too.

FWIW, this "filesystem image sector size mismatch" problem has been
around for many, many years - the same issue that occurs with loop
devices when you try to mount a 512 byte sector image on a hard 4k
sector host filesystem/storage device using direct IO in the loop
device. This isn't a new thing at all - if you want to use direct IO
to manipulate filesystem images, you actually need to know what you
are doing....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx