Re: [PATCH 2/3] xfs: add kmem_alloc_io()

Eric Sandeen <sandeen@xxxxxxxxxxx> · Wed, 21 Aug 2019 10:23:29 -0500

On 8/21/19 8:35 AM, Brian Foster wrote:
> On Wed, Aug 21, 2019 at 06:38:19PM +1000, Dave Chinner wrote:
>> From: Dave Chinner <dchinner@xxxxxxxxxx>
>>
>> Memory we use to submit for IO needs strict alignment to the
>> underlying driver contraints. Worst case, this is 512 bytes. Given
>> that all allocations for IO are always a power of 2 multiple of 512
>> bytes, the kernel heap provides natural alignment for objects of
>> these sizes and that suffices.
>>
>> Until, of course, memory debugging of some kind is turned on (e.g.
>> red zones, poisoning, KASAN) and then the alignment of the heap
>> objects is thrown out the window. Then we get weird IO errors and
>> data corruption problems because drivers don't validate alignment
>> and do the wrong thing when passed unaligned memory buffers in bios.
>>
>> TO fix this, introduce kmem_alloc_io(), which will guaranteeat least
> 
> s/TO/To/
> 
>> 512 byte alignment of buffers for IO, even if memory debugging
>> options are turned on. It is assumed that the minimum allocation
>> size will be 512 bytes, and that sizes will be power of 2 mulitples
>> of 512 bytes.
>>
>> Use this everywhere we allocate buffers for IO.
>>
>> This no longer fails with log recovery errors when KASAN is enabled
>> due to the brd driver not handling unaligned memory buffers:
>>
>> # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test
>>
>> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
>> ---
>>  fs/xfs/kmem.c            | 61 +++++++++++++++++++++++++++++-----------
>>  fs/xfs/kmem.h            |  1 +
>>  fs/xfs/xfs_buf.c         |  4 +--
>>  fs/xfs/xfs_log.c         |  2 +-
>>  fs/xfs/xfs_log_recover.c |  2 +-
>>  fs/xfs/xfs_trace.h       |  1 +
>>  6 files changed, 50 insertions(+), 21 deletions(-)
>>
>> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c
>> index edcf393c8fd9..ec693c0fdcff 100644
>> --- a/fs/xfs/kmem.c
>> +++ b/fs/xfs/kmem.c
> ...
>> @@ -62,6 +56,39 @@ kmem_alloc_large(size_t size, xfs_km_flags_t flags)
>>  	return ptr;
>>  }
>>  
>> +/*
>> + * Same as kmem_alloc_large, except we guarantee a 512 byte aligned buffer is
>> + * returned. vmalloc always returns an aligned region.
>> + */
>> +void *
>> +kmem_alloc_io(size_t size, xfs_km_flags_t flags)
>> +{
>> +	void	*ptr;
>> +
>> +	trace_kmem_alloc_io(size, flags, _RET_IP_);
>> +
>> +	ptr = kmem_alloc(size, flags | KM_MAYFAIL);
>> +	if (ptr) {
>> +		if (!((long)ptr & 511))
>> +			return ptr;
>> +		kfree(ptr);
>> +	}
>> +	return __kmem_vmalloc(size, flags);
>> +}
> 
> Even though it is unfortunate, this seems like a quite reasonable and
> isolated temporary solution to the problem to me. The one concern I have
> is if/how much this could affect performance under certain
> circumstances. I realize that these callsites are isolated in the common
> scenario. Less common scenarios like sub-page block sizes (whether due
> to explicit mkfs time format or default configurations on larger page
> size systems) can fall into this path much more frequently, however.
> 
> Since this implies some kind of vm debug option is enabled, performance
> itself isn't critical when this solution is active. But how bad is it in
> those cases where we might depend on this more heavily? Have you
> confirmed that the end configuration is still "usable," at least?
> 
> I ask because the repeated alloc/free behavior can easily be avoided via
> something like an mp flag (which may require a tweak to the
> kmem_alloc_io() interface) to skip further kmem_alloc() calls from this
> path once we see one unaligned allocation. That assumes this behavior is
> tied to functionality that isn't dynamically configured at runtime, of
> course.

This seems like a good idea to me.

-Eric