On Fri, 4 Nov 2022, Guoqing Jiang wrote: > > > On 11/3/22 11:20 PM, Mikulas Patocka wrote: > > > > On Thu, 3 Nov 2022, Mikulas Patocka wrote: > > > > > > BTW, is the mempool_free from endio -> dec_count -> complete_io? > > > > And io which caused the crash is from dm_io -> async_io / sync_io > > > > -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I > > > > suppose the io is for mirror image. > > > > > > > > Thanks, > > > > Guoqing > > > I presume that the bug is caused by destruction of a bio set while bio > > > from that set was in progress. When the bio finishes and an attempt is > > > made to free the bio, a crash happens when the code tries to free the bio > > > into a destroyed mempool. > > > > > > I can do more testing to validate this theory. > > > > > > Mikulas > > When I disable tail-call optimizations with "-fno-optimize-sibling-calls", > > I get this stacktrace: > > Just curious, is the option used for compile kernel or lvm? I used it to compile the kernel. I set export KCFLAGS="-march=barcelona -fno-optimize-sibling-calls" and recompiled the kernel. > BTW, this trace is different from previous one, and it is more > understandable to me, thanks. > > > [ 200.105367] Call Trace: > > [ 200.105611] <TASK> > > [ 200.105825] dump_stack_lvl+0x33/0x42 > > [ 200.106196] dump_stack+0xc/0xd > > [ 200.106516] mempool_free.cold+0x22/0x32 > > [ 200.106921] bio_free+0x49/0x60 > > [ 200.107239] bio_put+0x95/0x100 > > [ 200.107567] super_written+0x4f/0x120 [md_mod] > > [ 200.108020] bio_endio+0xe8/0x100 > > [ 200.108359] __dm_io_complete+0x1e9/0x300 [dm_mod] > > [ 200.108847] clone_endio+0xf4/0x1c0 [dm_mod] > > [ 200.109288] bio_endio+0xe8/0x100 > > [ 200.109621] __dm_io_complete+0x1e9/0x300 [dm_mod] > > [ 200.110102] clone_endio+0xf4/0x1c0 [dm_mod] > > Assume the above from this chain. > > clone_endio -> dm_io_dec_pending -> __dm_io_dec_pending -> dm_io_complete > -> __dm_io_complete -> bio_endio > > > [ 200.110543] bio_endio+0xe8/0x100 > > [ 200.110877] brd_submit_bio+0xf8/0x123 [brd] > > [ 200.111310] __submit_bio+0x7a/0x120 > > [ 200.111670] submit_bio_noacct_nocheck+0xb6/0x2a0 > > [ 200.112138] submit_bio_noacct+0x12e/0x3e0 > > [ 200.112551] dm_submit_bio_remap+0x46/0xa0 [dm_mod] > > [ 200.113036] flush_expired_bios+0x28/0x2f [dm_delay] > > Was flush_expired_bios triggered by the path? > > __dm_destroy or __dm_suspend -> suspend_targets -> delay_presuspend > -> del_timer_sync(&dc->delay_timer) -> handle_delayed_timer No - del_timer_sync doesn't call handle_delayed_timer. The timer was set by "mod_timer(&dc->delay_timer, expires);". When the time expires, it calls handle_delayed_timer. handle_delayed_timer calls "queue_work(dc->kdelayd_wq, &dc->flush_expired_bios);" - this submits a work item that calls "flush_expired_bios" and triggers this stacktrace. > > [ 200.113536] process_one_work+0x1b4/0x320 > > [ 200.113943] worker_thread+0x45/0x3e0 > > [ 200.114319] ? rescuer_thread+0x380/0x380 > > [ 200.114714] kthread+0xc2/0x100 > > [ 200.115035] ? kthread_complete_and_exit+0x20/0x20 > > [ 200.115517] ret_from_fork+0x1f/0x30 > > [ 200.115874] </TASK> > > > > The function super_written is obviously buggy, because it first wakes up a > > process and then calls bio_put(bio) - so the woken-up process is racing > > with bio_put(bio) and the result is that we attempt to free a bio into a > > destroyed bio set. > > Does it mean the woken-up process destroyed the bio set? Yes. > The io for super write is allocated from sync_set, and the bio set is > mostly destroyed in md_free_disk or md_stop, I assume md_stop is more > relevant here as it is called by dm-raid. Yes. > So I guess the problem is, raid_dtr is called while in flight (or > expired) bio still not completed, When device mapper calls raid_dtr, there are no external bios in progress (this is guaranteed by dm design). However, the bio that writes the superblock is still in progress and it races with the md_stop. I am not an expert in md, so I don't know if this is OK or not. > maybe lvm issues cmd to call dm_table_destroy and dm was in the progress > of suspend or destroy. Just my $0.02. > > > When I fix super_written, there are no longer any crashes. I'm posting a > > patch in the next email. > > IIRC, the code existed for a long time, I'd suggest try with mdadm test > suite first with the change. Cc Neil too. I get an error when attempting to run the testsuite: [debian:/usr/src/git/mdadm]# ./test Testing on linux-6.1.0-rc3 kernel /usr/src/git/mdadm/tests/00createnames... FAILED - see /var/tmp/00createnames.log and /var/tmp/fail00createnames.log for details [debian:/usr/src/git/mdadm]# cat /var/tmp/00createnames.log + . /usr/src/git/mdadm/tests/00createnames ++ set -x -e ++ _create /dev/md/name ++ local DEVNAME=/dev/md/name ++ local NAME= ++ [[ -z '' ]] ++ mdadm -CR /dev/md/name -l0 -n 1 /dev/loop0 --force ++ rm -f /var/tmp/stderr ++ case $* in ++ case $* in ++ for args in $* ++ [[ -CR =~ /dev/ ]] ++ for args in $* ++ [[ /dev/md/name =~ /dev/ ]] ++ [[ /dev/md/name =~ md ]] ++ for args in $* ++ [[ -l0 =~ /dev/ ]] ++ for args in $* ++ [[ -n =~ /dev/ ]] ++ for args in $* ++ [[ 1 =~ /dev/ ]] ++ for args in $* ++ [[ /dev/loop0 =~ /dev/ ]] ++ [[ /dev/loop0 =~ md ]] ++ /usr/src/git/mdadm/mdadm --zero /dev/loop0 mdadm: Unrecognised md component device - /dev/loop0 ++ for args in $* ++ [[ --force =~ /dev/ ]] ++ /usr/src/git/mdadm/mdadm --quiet -CR /dev/md/name -l0 -n 1 /dev/loop0 --force --auto=yes ++ rv=1 ++ case $* in ++ cat /var/tmp/stderr mdadm: unexpected failure opening /dev/md127 ++ return 1 ++ [[ 1 != \0 ]] ++ echo 'Cannot create device.' Cannot create device. ++ exit 1 mdadm: unexpected failure opening /dev/md127 [debian:/usr/src/git/mdadm]# cat /var/tmp/fail00createnames.log ## debian.vm: saving dmesg. ## debian.vm: saving proc mdstat. Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> ## debian.vm: no array assembled! ## debian.vm: saving dmesg. ## debian.vm: saving proc mdstat. Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> ## debian.vm: no array assembled! ## debian.vm: saving dmesg. ## debian.vm: saving proc mdstat. Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> ## debian.vm: no array assembled! ## debian.vm: saving dmesg. ## debian.vm: saving proc mdstat. Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] unused devices: <none> ## debian.vm: no array assembled! > Thanks, > Guoqing Mikulas