On Thu, 3 Nov 2022, Mikulas Patocka wrote: > > BTW, is the mempool_free from endio -> dec_count -> complete_io? > > And io which caused the crash is from dm_io -> async_io / sync_io > > -> dispatch_io, seems dm-raid1 can call it instead of dm-raid, so I > > suppose the io is for mirror image. > > > > Thanks, > > Guoqing > > I presume that the bug is caused by destruction of a bio set while bio > from that set was in progress. When the bio finishes and an attempt is > made to free the bio, a crash happens when the code tries to free the bio > into a destroyed mempool. > > I can do more testing to validate this theory. > > Mikulas When I disable tail-call optimizations with "-fno-optimize-sibling-calls", I get this stacktrace: [ 200.105367] Call Trace: [ 200.105611] <TASK> [ 200.105825] dump_stack_lvl+0x33/0x42 [ 200.106196] dump_stack+0xc/0xd [ 200.106516] mempool_free.cold+0x22/0x32 [ 200.106921] bio_free+0x49/0x60 [ 200.107239] bio_put+0x95/0x100 [ 200.107567] super_written+0x4f/0x120 [md_mod] [ 200.108020] bio_endio+0xe8/0x100 [ 200.108359] __dm_io_complete+0x1e9/0x300 [dm_mod] [ 200.108847] clone_endio+0xf4/0x1c0 [dm_mod] [ 200.109288] bio_endio+0xe8/0x100 [ 200.109621] __dm_io_complete+0x1e9/0x300 [dm_mod] [ 200.110102] clone_endio+0xf4/0x1c0 [dm_mod] [ 200.110543] bio_endio+0xe8/0x100 [ 200.110877] brd_submit_bio+0xf8/0x123 [brd] [ 200.111310] __submit_bio+0x7a/0x120 [ 200.111670] submit_bio_noacct_nocheck+0xb6/0x2a0 [ 200.112138] submit_bio_noacct+0x12e/0x3e0 [ 200.112551] dm_submit_bio_remap+0x46/0xa0 [dm_mod] [ 200.113036] flush_expired_bios+0x28/0x2f [dm_delay] [ 200.113536] process_one_work+0x1b4/0x320 [ 200.113943] worker_thread+0x45/0x3e0 [ 200.114319] ? rescuer_thread+0x380/0x380 [ 200.114714] kthread+0xc2/0x100 [ 200.115035] ? kthread_complete_and_exit+0x20/0x20 [ 200.115517] ret_from_fork+0x1f/0x30 [ 200.115874] </TASK> The function super_written is obviously buggy, because it first wakes up a process and then calls bio_put(bio) - so the woken-up process is racing with bio_put(bio) and the result is that we attempt to free a bio into a destroyed bio set. When I fix super_written, there are no longer any crashes. I'm posting a patch in the next email. Mikulas