On Fri, Jul 15, 2016 at 05:18:02PM -0700, Calvin Owens wrote: > Hello all, > > I've found a nasty source of slab corruption. Based on seeing similar symptoms > on boxes at Facebook, I suspect it's been around since at least 3.10. > > It only reproduces under memory pressure so far as I can tell: the issue seems > to be that XFS reclaims pages from buffers that are still in use by > scsi/block. I'm not sure which side the bug lies on, but I've only observed it > with XFS. > > [67203.776421] ================================================================== > [67203.792521] BUG: KASAN: use-after-free in xfs_destroy_ioend+0x3bf/0x4c0 at addr ffff8804cf466288 > [67203.812036] Read of size 8 by task python2.7/22913 > [67203.822713] ============================================================================= > [67203.840917] BUG buffer_head (Not tainted): kasan: bad access detected > [67203.855253] ----------------------------------------------------------------------------- > [67203.855253] > [67203.876727] Disabling lock debugging due to kernel taint > [67203.888575] INFO: Allocated in 0xffff8804cf465d40 age=18437180719206552994 cpu=2191548261 pid=-1 > [67203.908139] alloc_buffer_head+0x22/0xd0 > [67203.916903] ___slab_alloc+0x4e0/0x520 > [67203.925286] __slab_alloc+0x43/0x70 > [67203.933087] kmem_cache_alloc+0x228/0x2c0 > [67203.942042] alloc_buffer_head+0x22/0xd0 > [67203.950782] alloc_page_buffers+0xa9/0x1f0 > [67203.959936] create_empty_buffers+0x30/0x420 > [67203.969495] create_page_buffers+0x120/0x1b0 > [67203.979029] __block_write_begin+0x16b/0x1010 > [67203.988756] xfs_vm_write_begin+0x55/0x1b0 > [67203.997884] generic_perform_write+0x288/0x510 > [67204.007771] xfs_file_buffered_aio_write+0x316/0x780 > [67204.018811] xfs_file_write_iter+0x26f/0x6c0 > [67204.028313] __vfs_write+0x2a0/0x620 > [67204.036276] vfs_write+0x159/0x4c0 > [67204.043855] SyS_write+0xd2/0x1b0 > [67204.051245] INFO: Freed in 0x103fc80ec age=18446651500051355200 cpu=2165122683 pid=-1 > [67204.068634] free_buffer_head+0x41/0x90 > [67204.077175] __slab_free+0x1ed/0x340 > [67204.085138] kmem_cache_free+0x270/0x300 > [67204.093867] free_buffer_head+0x41/0x90 > [67204.102422] try_to_free_buffers+0x171/0x240 > [67204.111925] xfs_vm_releasepage+0xcb/0x3b0 > [67204.121101] try_to_release_page+0x106/0x190 > [67204.130602] shrink_page_list+0x118e/0x1a10 > [67204.139910] shrink_inactive_list+0x42c/0xdf0 > [67204.149600] shrink_zone_memcg+0xa09/0xfa0 > [67204.158715] shrink_zone+0x2c3/0xbc0 > [67204.166679] do_try_to_free_pages+0x42a/0x12f0 > [67204.176562] try_to_free_pages+0x1a3/0x5d0 > [67204.185709] __alloc_pages_nodemask+0xbeb/0x20d0 > [67204.195979] alloc_pages_vma+0x11b/0x5e0 > [67204.204709] handle_mm_fault+0x2c27/0x47d0 > [67204.213823] INFO: Slab 0xffffea00133d1900 objects=37 used=14 fp=0xffff8804cf464530 flags=0x2000000000004080 > [67204.235439] INFO: Object 0xffff8804cf466260 @offset=8800 fp=0xbbbbbbbbbbbbbbbb Hmmm - can only get there with a clean page, so this should be fine. > [67204.455817] CPU: 1 PID: 22913 Comm: python2.7 Tainted: G B 4.7.0-rc7-calvinowens-1468357363-00001-gcaa3dc6 #1 > [67204.480313] Hardware name: Wiwynn HoneyBadger/PantherPlus, BIOS HBM6.71 02/03/2016 > [67204.497509] ffff88075e99f480 ffff88075ec87a30 ffffffff81e8b8e4 ffff8804cf464000 > [67204.514224] ffff8804cf466260 ffff88075ec87a60 ffffffff8153a995 ffff88075e99f480 > [67204.530924] ffffea00133d1900 ffff8804cf466260 dffffc0000000000 ffff88075ec87a88 > [67204.547624] Call Trace: > [67204.553086] <IRQ> [<ffffffff81e8b8e4>] dump_stack+0x68/0x94 > [67204.565946] [<ffffffff8153a995>] print_trailer+0x115/0x1a0 > [67204.578334] [<ffffffff81541174>] object_err+0x34/0x40 > [67204.589762] [<ffffffff815436e7>] kasan_report_error+0x217/0x530 > [67204.616847] [<ffffffff81543b33>] __asan_report_load8_noabort+0x43/0x50 > [67204.645085] [<ffffffff819d651f>] xfs_destroy_ioend+0x3bf/0x4c0 > [67204.658243] [<ffffffff819d69d4>] xfs_end_bio+0x154/0x220 > [67204.685362] [<ffffffff81de0c58>] bio_endio+0x158/0x1b0 > [67204.696983] [<ffffffff81dff61b>] blk_update_request+0x18b/0xb80 > [67204.710334] [<ffffffff821baf57>] scsi_end_request+0x97/0x5a0 > [67204.723108] [<ffffffff821c5558>] scsi_io_completion+0x438/0x1690 > [67204.807293] [<ffffffff821a8d95>] scsi_finish_command+0x375/0x4e0 > [67204.820838] [<ffffffff821c3940>] scsi_softirq_done+0x280/0x340 > [67204.848884] [<ffffffff81e1e13f>] blk_done_softirq+0x1ff/0x360 > [67204.875074] [<ffffffff82a08cad>] __do_softirq+0x22d/0x8d7 > [67204.887270] [<ffffffff8113d42c>] irq_exit+0x15c/0x190 > [67204.898697] [<ffffffff82a085e3>] smp_apic_timer_interrupt+0x83/0xa0 > [67204.912815] [<ffffffff82a06b29>] apic_timer_interrupt+0x89/0x90 > [67205.029113] ================================================================== But this indicates that the page is under writeback at this point, so that tends to indicate that the above freeing was incorrect. Hmmm - it's clear we've got direct reclaim involved here, and the suspicion of a dirty page that has had it's bufferheads cleared. Are there any other warnings in the log from XFS prior to kasan throwing the error? > Is there anything else I can send that might be helpful? full console/dmesg output from a crashed machine, plus: http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > -- > /* > * Run as "./repro outfile 1000", where "outfile" sits on an XFS filesystem. > */ > > #include <stdlib.h> > #include <stdio.h> > #include <fcntl.h> > #include <signal.h> > #include <unistd.h> > #include <sys/stat.h> > #include <sys/types.h> > #include <sys/mman.h> > > #define CHUNK (32768) > > static const char crap[CHUNK]; > > int main(int argc, char **argv) > { > int r, fd, i; > size_t allocsize, count; > void *p; > > if (argc != 3) { > printf("Usage: %s filename count\n", argv[0]); > return 1; > } > > fd = open(argv[1], O_RDWR|O_CREAT, 0644); > if (fd == -1) { > perror("Can't open"); > return 1; > } > > if (!fork()) { > count = atol(argv[2]); > > while (1) { > for (i = 0; i < count; i++) > if (write(fd, crap, CHUNK) != CHUNK) > perror("Eh?"); > > fsync(fd); > ftruncate(fd, 0); > } Hmmmm. Truncate is used, but only after fsync. If the truncate is removed, does the problem go away? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html