On Sat, Oct 22, 2022 at 5:10 AM Song Liu <song@xxxxxxxxxx> wrote: > > On Fri, Oct 21, 2022 at 3:07 AM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > On Fri, Oct 21, 2022 at 3:50 AM Song Liu <song@xxxxxxxxxx> wrote: > > > > > > On Sun, Oct 16, 2022 at 7:11 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > > > > > It has added io_acct_set for raid0/raid5 io accounting and it needs to > > > > alloc md_io_acct in the i/o path. They are free when the bios come back > > > > from member disks. Now we don't have a method to monitor if those bios > > > > are all come back. In the takeover process, it needs to free the raid0 > > > > memory resource including the memory pool for md_io_acct. But maybe some > > > > bios are still not returned. When those bios are returned, it can cause > > > > panic bcause of introducing NULL pointer or invalid address. > > > > > > > > This patch adds io_acct_cnt. So when stopping raid0, it can use this > > > > to wait until all bios come back. > > > > > > > > Reported-by: Fine Fan <ffan@xxxxxxxxxx> > > > > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx> > > > > > > I have seen a lot of warnings and errors in dmesg with this patch. For example: > > > > > > [ 402.116463] ============================================================================= > > > [ 402.117176] BUG bio-144 (Tainted: G B W ): Right > > > Redzone overwritten > > > [ 402.117837] ----------------------------------------------------------------------------- > > > [ 402.117837] > > > [ 402.118713] 0xffff88816f683cd0-0xffff88816f683cd7 @offset=15568. > > > First byte 0x0 instead of 0xcc > > > [ 402.119505] Allocated in mempool_alloc+0x79/0x1a0 age=1038 cpu=19 pid=1130 > > > [ 402.120133] kmem_cache_alloc+0x2dc/0x3c0 > > > [ 402.120510] mempool_alloc+0x79/0x1a0 > > > [ 402.120840] bio_alloc_bioset+0xcb/0x530 > > > [ 402.121205] bio_alloc_clone+0x20/0x60 > > > [ 402.121560] md_account_bio+0x41/0x80 > > > [ 402.121890] raid5_make_request+0x1cf/0x1450 > > > [ 402.122327] md_handle_request+0x26c/0x3f0 > > > [ 402.122700] __submit_bio+0x53/0x180 > > > [ 402.123030] submit_bio_noacct_nocheck+0xe8/0x2b0 > > > [ 402.123453] __blkdev_direct_IO_async+0x109/0x1d0 > > > [ 402.123897] generic_file_direct_write+0x9c/0x1e0 > > > [ 402.124332] __generic_file_write_iter+0x95/0x170 > > > [ 402.124771] blkdev_write_iter+0xe9/0x180 > > > [ 402.125162] aio_write+0x11a/0x2e0 > > > [ 402.125503] io_submit_one+0x627/0xd20 > > > [ 402.125844] __x64_sys_io_submit+0x88/0x250 > > > [ 402.126223] Slab 0xffffea0005bda000 objects=51 used=51 > > > fp=0x0000000000000000 flags=0x200000000010200(slab|head|node=0|zone=2) > > > [ 402.127227] Object 0xffff88816f683c40 @offset=15424 fp=0x0000000000000000 > > > [ 402.127227] > > > [ 402.127960] Redzone ffff88816f683c00: cc cc cc cc cc cc cc cc cc > > > cc cc cc cc cc cc cc ................ > > > [ 402.128797] Redzone ffff88816f683c10: cc cc cc cc cc cc cc cc cc > > > cc cc cc cc cc cc cc ................ > > > [ 402.129665] Redzone ffff88816f683c20: cc cc cc cc cc cc cc cc cc > > > cc cc cc cc cc cc cc ................ > > > [ 402.130503] Redzone ffff88816f683c30: cc cc cc cc cc cc cc cc cc > > > cc cc cc cc cc cc cc ................ > > > [ 402.131336] Object ffff88816f683c40: 80 a3 68 6f 81 88 ff ff af > > > 21 00 00 01 00 00 00 ..ho.....!...... > > > [ 402.132166] Object ffff88816f683c50: 00 00 00 00 00 00 00 00 80 > > > 23 09 0b 81 88 ff ff .........#...... > > > [ 402.132996] Object ffff88816f683c60: 01 88 00 00 02 00 04 40 00 > > > 5a 5a 5a 00 00 00 00 .......@.ZZZ.... > > > [ 402.133822] Object ffff88816f683c70: 88 86 1c 00 00 00 00 00 00 > > > 10 00 00 00 00 00 00 ................ > > > [ 402.134647] Object ffff88816f683c80: 00 00 00 00 ff ff ff ff e0 > > > a9 a8 81 ff ff ff ff ................ > > > [ 402.135501] Object ffff88816f683c90: 40 3c 68 6f 81 88 ff ff 00 > > > 00 00 00 00 00 00 00 @<ho............ > > > [ 402.136354] Object ffff88816f683ca0: 00 00 00 00 00 00 00 00 00 > > > 00 00 00 00 00 00 00 ................ > > > [ 402.137174] Object ffff88816f683cb0: 00 00 00 00 00 00 00 00 00 > > > 00 00 00 01 00 00 00 ................ > > > [ 402.138027] Object ffff88816f683cc0: 00 a4 68 6f 81 88 ff ff 40 > > > 2f c4 73 81 88 ff ff ..ho....@/.s.... > > > [ 402.138857] Redzone ffff88816f683cd0: 00 20 c4 73 81 88 ff ff > > > . .s.... > > > [ 402.139657] Padding ffff88816f683d20: 5a 5a 5a 5a 5a 5a 5a 5a 5a > > > 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ > > > [ 402.140510] Padding ffff88816f683d30: 5a 5a 5a 5a 5a 5a 5a 5a 5a > > > 5a 5a 5a 5a 5a 5a 5a ZZZZZZZZZZZZZZZZ > > > [ 402.141345] CPU: 29 PID: 1092 Comm: md0_raid5 Tainted: G B W > > > 6.1.0-rc1+ #145 > > > [ 402.142083] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > > > BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 > > > [ 402.143127] Call Trace: > > > [ 402.143365] <TASK> > > > [ 402.143563] dump_stack_lvl+0x45/0x5d > > > [ 402.143899] check_bytes_and_report.cold+0x6d/0x85 > > > [ 402.144343] check_object+0x1fa/0x2d0 > > > [ 402.144675] free_debug_processing+0x1bc/0x660 > > > [ 402.145091] ? md_end_io_acct+0x3c/0x80 > > > [ 402.145464] ? md_end_io_acct+0x3c/0x80 > > > [ 402.145812] kmem_cache_free+0x55f/0x5b0 > > > [ 402.146164] md_end_io_acct+0x3c/0x80 > > > [ 402.146498] handle_stripe+0x11a5/0x1d70 > > > [ 402.146849] handle_active_stripes.constprop.0+0x487/0x5e0 > > > [ 402.147353] raid5d+0x40d/0x680 > > > [ 402.147640] ? lock_acquire+0x1ad/0x310 > > > [ 402.147989] md_thread+0xc2/0x170 > > > [ 402.148319] ? prepare_to_wait_exclusive+0xe0/0xe0 > > > [ 402.148749] ? register_md_personality+0x90/0x90 > > > [ 402.149162] kthread+0xf2/0x120 > > > [ 402.149455] ? kthread_complete_and_exit+0x20/0x20 > > > [ 402.149884] ret_from_fork+0x22/0x30 > > > [ 402.150211] </TASK> > > > [ 402.150431] FIX bio-144: Restoring Right Redzone > > > 0xffff88816f683cd0-0xffff88816f683cd7=0xcc > > > [ 402.151196] FIX bio-144: Object at 0xffff88816f683c40 not freed > > > > > > Please fix them and resend. > > > > > > Thanks, > > > Song > > > > Hi Song > > > > What commands do you run? I've run some tests and didn't see the messages. > > By the way, what disks do you use? > > I see these with regular IO. Some fio-libaio-direct workload should trigger it. > This is running in Qemu on virtual nvme devices, and with some debug > options enabled (KASAN, LOCKDEP, etc.). Hi Song I have reproduced this. Thanks for pointing this out. I'll fix this and re-send v2. Regards Xiao