On 8/21/19 8:35 AM, Brian Foster wrote: > On Wed, Aug 21, 2019 at 06:38:19PM +1000, Dave Chinner wrote: >> From: Dave Chinner <dchinner@xxxxxxxxxx> >> >> Memory we use to submit for IO needs strict alignment to the >> underlying driver contraints. Worst case, this is 512 bytes. Given >> that all allocations for IO are always a power of 2 multiple of 512 >> bytes, the kernel heap provides natural alignment for objects of >> these sizes and that suffices. >> >> Until, of course, memory debugging of some kind is turned on (e.g. >> red zones, poisoning, KASAN) and then the alignment of the heap >> objects is thrown out the window. Then we get weird IO errors and >> data corruption problems because drivers don't validate alignment >> and do the wrong thing when passed unaligned memory buffers in bios. >> >> TO fix this, introduce kmem_alloc_io(), which will guaranteeat least > > s/TO/To/ > >> 512 byte alignment of buffers for IO, even if memory debugging >> options are turned on. It is assumed that the minimum allocation >> size will be 512 bytes, and that sizes will be power of 2 mulitples >> of 512 bytes. >> >> Use this everywhere we allocate buffers for IO. >> >> This no longer fails with log recovery errors when KASAN is enabled >> due to the brd driver not handling unaligned memory buffers: >> >> # mkfs.xfs -f /dev/ram0 ; mount /dev/ram0 /mnt/test >> >> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> >> --- >> fs/xfs/kmem.c | 61 +++++++++++++++++++++++++++++----------- >> fs/xfs/kmem.h | 1 + >> fs/xfs/xfs_buf.c | 4 +-- >> fs/xfs/xfs_log.c | 2 +- >> fs/xfs/xfs_log_recover.c | 2 +- >> fs/xfs/xfs_trace.h | 1 + >> 6 files changed, 50 insertions(+), 21 deletions(-) >> >> diff --git a/fs/xfs/kmem.c b/fs/xfs/kmem.c >> index edcf393c8fd9..ec693c0fdcff 100644 >> --- a/fs/xfs/kmem.c >> +++ b/fs/xfs/kmem.c > ... >> @@ -62,6 +56,39 @@ kmem_alloc_large(size_t size, xfs_km_flags_t flags) >> return ptr; >> } >> >> +/* >> + * Same as kmem_alloc_large, except we guarantee a 512 byte aligned buffer is >> + * returned. vmalloc always returns an aligned region. >> + */ >> +void * >> +kmem_alloc_io(size_t size, xfs_km_flags_t flags) >> +{ >> + void *ptr; >> + >> + trace_kmem_alloc_io(size, flags, _RET_IP_); >> + >> + ptr = kmem_alloc(size, flags | KM_MAYFAIL); >> + if (ptr) { >> + if (!((long)ptr & 511)) >> + return ptr; >> + kfree(ptr); >> + } >> + return __kmem_vmalloc(size, flags); >> +} > > Even though it is unfortunate, this seems like a quite reasonable and > isolated temporary solution to the problem to me. The one concern I have > is if/how much this could affect performance under certain > circumstances. I realize that these callsites are isolated in the common > scenario. Less common scenarios like sub-page block sizes (whether due > to explicit mkfs time format or default configurations on larger page > size systems) can fall into this path much more frequently, however. > > Since this implies some kind of vm debug option is enabled, performance > itself isn't critical when this solution is active. But how bad is it in > those cases where we might depend on this more heavily? Have you > confirmed that the end configuration is still "usable," at least? > > I ask because the repeated alloc/free behavior can easily be avoided via > something like an mp flag (which may require a tweak to the > kmem_alloc_io() interface) to skip further kmem_alloc() calls from this > path once we see one unaligned allocation. That assumes this behavior is > tied to functionality that isn't dynamically configured at runtime, of > course. This seems like a good idea to me. -Eric