Re: [PATCH 2/3] xfs: add kmem_alloc_io()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 21, 2019 at 08:08:01AM -0700, Darrick J. Wong wrote:
> On Wed, Aug 21, 2019 at 09:35:33AM -0400, Brian Foster wrote:
> > On Wed, Aug 21, 2019 at 06:38:19PM +1000, Dave Chinner wrote:
> > > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > > 
> > > Memory we use to submit for IO needs strict alignment to the
> > > underlying driver contraints. Worst case, this is 512 bytes. Given
> > > that all allocations for IO are always a power of 2 multiple of 512
> > > bytes, the kernel heap provides natural alignment for objects of
> > > these sizes and that suffices.
> > > 
> > > Until, of course, memory debugging of some kind is turned on (e.g.
> > > red zones, poisoning, KASAN) and then the alignment of the heap
> > > objects is thrown out the window. Then we get weird IO errors and
> > > data corruption problems because drivers don't validate alignment
> > > and do the wrong thing when passed unaligned memory buffers in bios.
> > > 
> > > TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
> > 
> > s/TO/To/
> > 
> > > 512 byte alignment of buffers for IO, even if memory debugging
> > > options are turned on. It is assumed that the minimum allocation
> > > size will be 512 bytes, and that sizes will be power of 2 mulitples
> > > of 512 bytes.
> > > 
> > > Use this everywhere we allocate buffers for IO.
> > > 
> > > This no longer fails with log recovery errors when KASAN is enabled
> > > due to the brd driver not handling unaligned memory buffers:
> > > 
> > > # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
> > > 
> > > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/kmem.c            | 61 +++++++++++++++++++++++++++++-----------
> > >  fs/xfs/kmem.h            |  1 +
> > >  fs/xfs/xfs_buf.c         |  4 +--
> > >  fs/xfs/xfs_log.c         |  2 +-
> > >  fs/xfs/xfs_log_recover.c |  2 +-
> > >  fs/xfs/xfs_trace.h       |  1 +
> > >  6 files changed, 50 insertions(+), 21 deletions(-)
> > > 
> > > diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
> > > index edcf393c8fd9..ec693c0fdcff 100644
> > > --- a/fs/xfs/kmem.c
> > > +++ b/fs/xfs/kmem.c
> > ...
> > > @@ -62,6 +56,39 @@ kmem_alloc_large(size_t size, xfs_km_flags_t flags)
> > >  	return ptr;
> > >  }
> > >  
> > > +/*
> > > + * Same as kmem_alloc_large, except we guarantee a 512 byte aligned buffer is
> > > + * returned. vmalloc always returns an aligned region.
> > > + */
> > > +void *
> > > +kmem_alloc_io(size_t size, xfs_km_flags_t flags)
> > > +{
> > > +	void	*ptr;
> > > +
> > > +	trace_kmem_alloc_io(size, flags, _RET_IP_);
> > > +
> > > +	ptr = kmem_alloc(size, flags | KM_MAYFAIL);
> > > +	if (ptr) {
> > > +		if (!((long)ptr & 511))
> 
> Er... didn't you say "it needs to grab the alignment from
> [blk_]queue_dma_alignment(), not use a hard coded value of 511"?

That's fine for the bio, which has a direct pointer to the request
queue. Here the allocation may be a long way away from the IO
itself, and we migh actually be slicing and dicing an allocated
region into a bio and not just the whole region itself.

So I've just taken the worst case - queue_dma_alignment() returns
511 if no queue is supplied, so this is the worst case alignment
that the block device will require. We don't need any more to fix
the problem right now, and getting alignment into this function in
all cases makes it a bit more complicated...

> How is this different?  If this buffer is really for IO then shouldn't
> we pass in the buftarg or something so that we find the real alignment?
> Or check it down in the xfs_buf code that associates a page to a buffer?

No, at worst we should in the alignment - there is no good reason to
be passing buftargs, block devices or request queues into memory
allocation APIs. I'll have a look at this.

> Even if all that logic is hidden behind CONFIG_XFS_DEBUG?

That's no good because memory debugging can be turned on without
CONFIG_XFS_DEBUG (how we tripped over the xenblk case).

> 
> > > +			return ptr;
> > > +		kfree(ptr);
> > > +	}
> > > +	return __kmem_vmalloc(size, flags);
> 
> How about checking the vmalloc alignment too?  If we're going to be this
> paranoid we might as well go all the way. :)

if vmalloc is returning unaligned regions, lots of stuff is going to
break everywhere. It has to be page aligned because of the pte
mappings it requires. Memory debugging puts guard pages around
vmalloc, it doesn't change the alignment.

Cheers,

Dave.

-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux