On Wed, Sep 29 2021 at 7:59P -0400, Jiazi Li <jqqlijiazi@xxxxxxxxx> wrote: > dm_io_dec_pending call end_io_acct first, will dec md in-flight > pending count. If a task is swapping table at same time. > task1 task2 > do_resume > ->do_suspend > ->dm_wait_for_completion > bio_endio > ->clone_endio > ->dm_io_dec_pending > ->end_io_acct > ->wakeup task1 > ->dm_swap_table > ->__bind > ->__bind_mempools > ->bioset_exit > ->mempool_exit > ->free_io > mempool->elements is NULL, and lead to following crash: > [ 67.330330] Unable to handle kernel NULL pointer dereference at virtual > address 0000000000000000 > ...... > [ 67.330494] pstate: 80400085 (Nzcv daIf +PAN -UAO) > [ 67.330510] pc : mempool_free+0x70/0xa0 > [ 67.330515] lr : mempool_free+0x4c/0xa0 > [ 67.330520] sp : ffffff8008013b20 > [ 67.330524] x29: ffffff8008013b20 x28: 0000000000000004 > [ 67.330530] x27: ffffffa8c2ff40a0 x26: 00000000ffff1cc8 > [ 67.330535] x25: 0000000000000000 x24: ffffffdada34c800 > [ 67.330541] x23: 0000000000000000 x22: ffffffdada34c800 > [ 67.330547] x21: 00000000ffff1cc8 x20: ffffffd9a1304d80 > [ 67.330552] x19: ffffffdada34c970 x18: 000000b312625d9c > [ 67.330558] x17: 00000000002dcfbf x16: 00000000000006dd > [ 67.330563] x15: 000000000093b41e x14: 0000000000000010 > [ 67.330569] x13: 0000000000007f7a x12: 0000000034155555 > [ 67.330574] x11: 0000000000000001 x10: 0000000000000001 > [ 67.330579] x9 : 0000000000000000 x8 : 0000000000000000 > [ 67.330585] x7 : 0000000000000000 x6 : ffffff80148b5c1a > [ 67.330590] x5 : ffffff8008013ae0 x4 : 0000000000000001 > [ 67.330596] x3 : ffffff80080139c8 x2 : ffffff801083bab8 > [ 67.330601] x1 : 0000000000000000 x0 : ffffffdada34c970 > [ 67.330609] Call trace: > [ 67.330616] mempool_free+0x70/0xa0 > [ 67.330627] bio_put+0xf8/0x110 > [ 67.330638] dec_pending+0x13c/0x230 > [ 67.330644] clone_endio+0x90/0x180 > [ 67.330649] bio_endio+0x198/0x1b8 > [ 67.330655] dec_pending+0x190/0x230 > [ 67.330660] clone_endio+0x90/0x180 > [ 67.330665] bio_endio+0x198/0x1b8 > [ 67.330673] blk_update_request+0x214/0x428 > [ 67.330683] scsi_end_request+0x2c/0x300 > [ 67.330688] scsi_io_completion+0xa0/0x710 > [ 67.330695] scsi_finish_command+0xd8/0x110 > [ 67.330700] scsi_softirq_done+0x114/0x148 > [ 67.330708] blk_done_softirq+0x74/0xd0 > [ 67.330716] __do_softirq+0x18c/0x374 > [ 67.330724] irq_exit+0xb4/0xb8 > [ 67.330732] __handle_domain_irq+0x84/0xc0 > [ 67.330737] gic_handle_irq+0x148/0x1b0 > [ 67.330744] el1_irq+0xe8/0x190 > [ 67.330753] lpm_cpuidle_enter+0x4f8/0x538 > [ 67.330759] cpuidle_enter_state+0x1fc/0x398 > [ 67.330764] cpuidle_enter+0x18/0x20 > [ 67.330772] do_idle+0x1b4/0x290 > [ 67.330778] cpu_startup_entry+0x20/0x28 > [ 67.330786] secondary_start_kernel+0x160/0x170 > > Move end_io_acct after free_io to fix this issue. > > Signed-off-by: Jiazi Li <lijiazi@xxxxxxxxxx> Thanks very much for this. You did a wonderful job analyzing and fixing this race. I've tweaked the header slightly to improve clarity and made one whitespace indentation adjustment. I've now marked this for stable@ and queued this up. Mike -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel