Re: A crash caused by the commit 0dd84b319352bb8ba64752d4e45396d8b13e6018

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Fri, 4 Nov 2022 09:40:54 -0400 (EDT)

On Fri, 4 Nov 2022, Guoqing Jiang wrote:

> 
> 
> On 11/3/22 11:20 PM, Mikulas Patocka wrote:
> > 
> > On Thu, 3 Nov 2022, Mikulas Patocka wrote:
> > 
> > > > BTW, is the mempool_free from endio -> dec_count -> complete_io?
> > > > And io which caused the crash is from dm_io -> async_io / sync_io
> > > >   -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I
> > > > suppose the io is for mirror image.
> > > > 
> > > > Thanks,
> > > > Guoqing
> > > I presume that the bug is caused by destruction of a bio set while bio
> > > from that set was in progress. When the bio finishes and an attempt is
> > > made to free the bio, a crash happens when the code tries to free the bio
> > > into a destroyed mempool.
> > > 
> > > I can do more testing to validate this theory.
> > > 
> > > Mikulas
> > When I disable tail-call optimizations with "-fno-optimize-sibling-calls",
> > I get this stacktrace:
> 
> Just curious, is the option used for compile kernel or lvm?

I used it to compile the kernel. I set
export KCFLAGS="-march=barcelona -fno-optimize-sibling-calls"
and recompiled the kernel.

> BTW, this trace is different from previous one, and it is more 
> understandable to me, thanks.
> 
> > [  200.105367] Call Trace:
> > [  200.105611]  <TASK>
> > [  200.105825]  dump_stack_lvl+0x33/0x42
> > [  200.106196]  dump_stack+0xc/0xd
> > [  200.106516]  mempool_free.cold+0x22/0x32
> > [  200.106921]  bio_free+0x49/0x60
> > [  200.107239]  bio_put+0x95/0x100
> > [  200.107567]  super_written+0x4f/0x120 [md_mod]
> > [  200.108020]  bio_endio+0xe8/0x100
> > [  200.108359]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> > [  200.108847]  clone_endio+0xf4/0x1c0 [dm_mod]
> > [  200.109288]  bio_endio+0xe8/0x100
> > [  200.109621]  __dm_io_complete+0x1e9/0x300 [dm_mod]
> > [  200.110102]  clone_endio+0xf4/0x1c0 [dm_mod]
> 
> Assume the above from this chain.
> 
> clone_endio -> dm_io_dec_pending -> __dm_io_dec_pending -> dm_io_complete
> -> __dm_io_complete -> bio_endio
> 
> > [  200.110543]  bio_endio+0xe8/0x100
> > [  200.110877]  brd_submit_bio+0xf8/0x123 [brd]
> > [  200.111310]  __submit_bio+0x7a/0x120
> > [  200.111670]  submit_bio_noacct_nocheck+0xb6/0x2a0
> > [  200.112138]  submit_bio_noacct+0x12e/0x3e0
> > [  200.112551]  dm_submit_bio_remap+0x46/0xa0 [dm_mod]
> > [  200.113036]  flush_expired_bios+0x28/0x2f [dm_delay]
> 
> Was flush_expired_bios triggered by the path?
> 
> __dm_destroy or __dm_suspend -> suspend_targets -> delay_presuspend
> -> del_timer_sync(&dc->delay_timer) -> handle_delayed_timer

No - del_timer_sync doesn't call handle_delayed_timer.

The timer was set by "mod_timer(&dc->delay_timer, expires);". When the 
time expires, it calls handle_delayed_timer. handle_delayed_timer calls 
"queue_work(dc->kdelayd_wq, &dc->flush_expired_bios);" - this submits a 
work item that calls "flush_expired_bios" and triggers this stacktrace.

> > [  200.113536]  process_one_work+0x1b4/0x320
> > [  200.113943]  worker_thread+0x45/0x3e0
> > [  200.114319]  ? rescuer_thread+0x380/0x380
> > [  200.114714]  kthread+0xc2/0x100
> > [  200.115035]  ? kthread_complete_and_exit+0x20/0x20
> > [  200.115517]  ret_from_fork+0x1f/0x30
> > [  200.115874]  </TASK>
> > 
> > The function super_written is obviously buggy, because it first wakes up a
> > process and then calls bio_put(bio) - so the woken-up process is racing
> > with bio_put(bio) and the result is that we attempt to free a bio into a
> > destroyed bio set.
> 
> Does it mean the woken-up process destroyed the bio set?

Yes.

> The io for super write is allocated from sync_set, and the bio set is 
> mostly destroyed in md_free_disk or md_stop, I assume md_stop is more 
> relevant here as it is called by dm-raid.

Yes.

> So I guess the problem is, raid_dtr is called while in flight (or 
> expired) bio still not completed,

When device mapper calls raid_dtr, there are no external bios in progress 
(this is guaranteed by dm design). However, the bio that writes the 
superblock is still in progress and it races with the md_stop. I am not an 
expert in md, so I don't know if this is OK or not.

> maybe lvm issues cmd to call dm_table_destroy and dm was in the progress 
> of suspend or destroy. Just my $0.02.
> 
> > When I fix super_written, there are no longer any crashes. I'm posting a
> > patch in the next email.
> 
> IIRC, the code existed for a long time, I'd suggest try with mdadm test 
> suite first with the change. Cc Neil too.

I get an error when attempting to run the testsuite:
[debian:/usr/src/git/mdadm]# ./test
Testing on linux-6.1.0-rc3 kernel
/usr/src/git/mdadm/tests/00createnames... FAILED - see 
/var/tmp/00createnames.log and /var/tmp/fail00createnames.log for details

[debian:/usr/src/git/mdadm]# cat /var/tmp/00createnames.log
+ . /usr/src/git/mdadm/tests/00createnames
++ set -x -e
++ _create /dev/md/name
++ local DEVNAME=/dev/md/name
++ local NAME=
++ [[ -z '' ]]
++ mdadm -CR /dev/md/name -l0 -n 1 /dev/loop0 --force
++ rm -f /var/tmp/stderr
++ case $* in
++ case $* in
++ for args in $*
++ [[ -CR =~ /dev/ ]]
++ for args in $*
++ [[ /dev/md/name =~ /dev/ ]]
++ [[ /dev/md/name =~ md ]]
++ for args in $*
++ [[ -l0 =~ /dev/ ]]
++ for args in $*
++ [[ -n =~ /dev/ ]]
++ for args in $*
++ [[ 1 =~ /dev/ ]]
++ for args in $*
++ [[ /dev/loop0 =~ /dev/ ]]
++ [[ /dev/loop0 =~ md ]]
++ /usr/src/git/mdadm/mdadm --zero /dev/loop0
mdadm: Unrecognised md component device - /dev/loop0
++ for args in $*
++ [[ --force =~ /dev/ ]]
++ /usr/src/git/mdadm/mdadm --quiet -CR /dev/md/name -l0 -n 1 /dev/loop0 
--force --auto=yes
++ rv=1
++ case $* in
++ cat /var/tmp/stderr
mdadm: unexpected failure opening /dev/md127
++ return 1
++ [[ 1 != \0 ]]
++ echo 'Cannot create device.'
Cannot create device.
++ exit 1
mdadm: unexpected failure opening /dev/md127

[debian:/usr/src/git/mdadm]# cat /var/tmp/fail00createnames.log
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!
## debian.vm: saving dmesg.
## debian.vm: saving proc mdstat.
Personalities : [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
unused devices: <none>
## debian.vm: no array assembled!

> Thanks,
> Guoqing

Mikulas