On Mon, Aug 19, 2019 at 06:40:12AM +0200, hch@xxxxxx wrote: > On Mon, Aug 19, 2019 at 06:29:05AM +0200, hch@xxxxxx wrote: > > On Mon, Aug 19, 2019 at 02:22:59PM +1000, Dave Chinner wrote: > > > That implies a kmalloc heap issue. > > > > > > Oh, is memory poisoning or something that modifies the alignment of > > > slabs turned on? > > > > > > i.e. 4k/8k allocations from the kmalloc heap slabs might not be > > > appropriately aligned for IO, similar to the problems we have with > > > the xen blk driver? > > > > That is what I suspect, and as you can see in the attached config I > > usually run with slab debuggig on. > > Yep, looks like an unaligned allocation: *nod* > With the following debug patch. Based on that I think I'll just > formally submit the vmalloc switch as we're at -rc5, and then we > can restart the unaligned slub allocation drama.. But I'm not so sure that's all there is to it. I turned KASAN on and it works for the first few mkfs/mount cycles, then a mount basically pegs a CPU and it's spending most of it's time inside KASAN accouting code like this: ..... save_stack+0x19/0x80 __kasan_kmalloc.constprop.10+0xc1/0xd0 kmem_cache_alloc+0xd2/0x220 mempool_alloc+0xda/0x230 bio_alloc_bioset+0x12d/0x2d0 xfs_rw_bdev+0x53/0x290 xlog_do_io+0xd1/0x1c0 xlog_bread+0x23/0x70 xlog_rseek_logrec_hdr+0x207/0x2a0 After a couple of minutes, the mount fails an ASSERT: XFS: Assertion failed: head_blk != tail_blk, file: fs/xfs/xfs_log_recover.c, line: 5236 And moments after KASAN spews this: BUG: KASAN: use-after-free in rwsem_down_read_slowpath+0x685/0x8f0 Read of size 4 at addr ffff88806f152778 by task systemd-udevd/2231 CPU: 4 PID: 2231 Comm: systemd-udevd Tainted: G D 5.3.0-rc5-dgc+ #1506 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: dump_stack+0x7c/0xc0 print_address_description+0x6c/0x322 ? rwsem_down_read_slowpath+0x685/0x8f0 __kasan_report.cold.6+0x1c/0x3e ? rwsem_down_read_slowpath+0x685/0x8f0 ? rwsem_down_read_slowpath+0x685/0x8f0 kasan_report+0xe/0x12 rwsem_down_read_slowpath+0x685/0x8f0 ? unwind_get_return_address_ptr+0x50/0x50 ? unwind_next_frame+0x6d6/0x8a0 ? __down_timeout+0x1c0/0x1c0 ? unwind_next_frame+0x6d6/0x8a0 ? _raw_spin_lock+0x87/0xe0 ? _raw_spin_lock+0x87/0xe0 ? __cpuidle_text_end+0x5/0x5 ? set_init_blocksize+0xe0/0xe0 ? preempt_count_sub+0x43/0x50 ? __might_sleep+0x31/0xd0 ? set_init_blocksize+0xe0/0xe0 ? ___might_sleep+0xc8/0xe0 down_read+0x18d/0x1a0 ? refcount_sub_and_test_checked+0xaf/0x150 ? rwsem_down_read_slowpath+0x8f0/0x8f0 ? _raw_spin_lock+0x87/0xe0 __get_super.part.12+0xf8/0x130 fsync_bdev+0xf/0x60 invalidate_partition+0x1e/0x40 rescan_partitions+0x8a/0x420 blkdev_reread_part+0x1e/0x30 blkdev_ioctl+0xb0b/0xe60 ? __blkdev_driver_ioctl+0x80/0x80 ? __bpf_prog_run64+0xc0/0xc0 ? stack_trace_save+0x8a/0xb0 ? save_stack+0x4d/0x80 ? __seccomp_filter+0x133/0xa10 ? preempt_count_sub+0x43/0x50 block_ioctl+0x6d/0x80 do_vfs_ioctl+0x134/0x9c0 ? ioctl_preallocate+0x140/0x140 ? selinux_file_ioctl+0x2b7/0x360 ? selinux_capable+0x20/0x20 ? syscall_trace_enter+0x233/0x540 ksys_ioctl+0x60/0x90 __x64_sys_ioctl+0x3d/0x50 do_syscall_64+0x70/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fade328a427 Code: 00 00 90 48 8b 05 69 aa 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 39 aa 0c 00 8 RSP: 002b:00007ffdc4755928 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 000000000000000e RCX: 00007fade328a427 RDX: 0000000000000000 RSI: 000000000000125f RDI: 000000000000000e RBP: 0000000000000000 R08: 0000559597306140 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 000055959736dbc0 R13: 0000000000000000 R14: 00007ffdc47569c8 R15: 000055959730dac0 Allocated by task 4739: save_stack+0x19/0x80 __kasan_kmalloc.constprop.10+0xc1/0xd0 kmem_cache_alloc_node+0xf3/0x240 copy_process+0x1f91/0x2f20 _do_fork+0xe0/0x530 __x64_sys_clone+0x10e/0x160 do_syscall_64+0x70/0x230 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 0: save_stack+0x19/0x80 __kasan_slab_free+0x12e/0x180 kmem_cache_free+0x84/0x2c0 rcu_core+0x35f/0x810 __do_softirq+0x15f/0x476 The buggy address belongs to the object at ffff888237048000 which belongs to the cache task_struct of size 9792 The buggy address is located 56 bytes inside of 9792-byte region [ffff888237048000, ffff88823704a640) The buggy address belongs to the page: page:ffffea0008dc1200 refcount:1 mapcount:0 mapping:ffff888078a91800 index:0x0 compound_mapcount: 0 flags: 0x57ffffc0010200(slab|head) raw: 0057ffffc0010200 dead000000000100 dead000000000122 ffff888078a91800 raw: 0000000000000000 0000000000030003 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff888237047f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888237047f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff888237048000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff888237048080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888237048100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ================================================================== Oh, hell, it's an rwsem that is referencing a free task. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx