> 2022年5月29日 02:11,Lei Li <noctis.akm@xxxxxxxxx> 写道: > > Sorry for replying so late, because of my testing environment. After > several fio random writing and reboot tests, > this patch worked well. Here is the test result. > > 1. I successfully reproduced deadlock warning by reverting commit > 7e865eba00a3 ("bcache: fix potential > deadlock in cached_def_free()"). > [ 105.448074] ====================================================== > [ 105.449634] WARNING: possible circular locking dependency detected > [ 105.451185] 5.4.191-MANJARO #1 Not tainted > [ 105.452221] ------------------------------------------------------ > [ 105.453742] kworker/9:15/1272 is trying to acquire lock: > [ 105.455042] ffff9e9ef1c44348 > ((wq_completion)bcache_writeback_wq){+.+.}, at: > flush_workqueue+0x8a/0x550 > [ 105.457482] > [ 105.457482] but task is already holding lock: > [ 105.458926] ffffbaba01203e68 > ((work_completion)(&cl->work)#2){+.+.}, at: > process_one_work+0x1c3/0x5a0 > [ 105.461568] > [ 105.461568] which lock already depends on the new lock. > [ 105.461568] > [ 105.463810] > [ 105.463810] the existing dependency chain (in reverse order) is: > [ 105.465570] > [ 105.465570] -> #1 ((work_completion)(&cl->work)#2){+.+.}: > [ 105.467230] process_one_work+0x21a/0x5a0 > [ 105.468449] worker_thread+0x52/0x3c0 > [ 105.469653] kthread+0x132/0x150 > [ 105.470695] ret_from_fork+0x3a/0x50 > [ 105.471649] > [ 105.471649] -> #0 ((wq_completion)bcache_writeback_wq){+.+.}: > [ 105.473453] __lock_acquire+0x105b/0x1c50 > [ 105.474678] lock_acquire+0xc4/0x1b0 > [ 105.476007] flush_workqueue+0xad/0x550 > [ 105.477160] drain_workqueue+0xb6/0x170 > [ 105.478237] destroy_workqueue+0x36/0x290 > [ 105.479328] cached_dev_free+0x45/0x1e0 [bcache] > [ 105.480595] process_one_work+0x243/0x5a0 > [ 105.481714] worker_thread+0x52/0x3c0 > [ 105.482687] kthread+0x132/0x150 > [ 105.483667] ret_from_fork+0x3a/0x50 > [ 105.484707] > [ 105.484707] other info that might help us debug this: > [ 105.484707] > [ 105.486856] Possible unsafe locking scenario: > [ 105.486856] > [ 105.488314] CPU0 CPU1 > [ 105.489336] ---- ---- > [ 105.490438] lock((work_completion)(&cl->work)#2); > [ 105.491587] > lock((wq_completion)bcache_writeback_wq); > [ 105.493554] > lock((work_completion)(&cl->work)#2); > [ 105.495360] lock((wq_completion)bcache_writeback_wq); > [ 105.496878] > [ 105.496878] *** DEADLOCK *** > [ 105.496878] > [ 105.498303] 2 locks held by kworker/9:15/1272: > [ 105.499373] #0: ffff9e9f16c21148 ((wq_completion)events){+.+.}, > at: process_one_work+0x1c3/0x5a0 > [ 105.501514] #1: ffffbaba01203e68 > ((work_completion)(&cl->work)#2){+.+.}, at: > process_one_work+0x1c3/0x5a0 > [ 105.504221] > [ 105.504221] stack backtrace: > [ 105.505271] CPU: 9 PID: 1272 Comm: kworker/9:15 Not tainted > 5.4.191-MANJARO #1 > [ 105.507069] Hardware name: VMware, Inc. VMware Virtual > Platform/440BX Desktop Reference Platform, BIOS 6.00 11/12/2020 > [ 105.509485] Workqueue: events cached_dev_free [bcache] > [ 105.510805] Call Trace: > [ 105.511471] dump_stack+0x8b/0xbd > [ 105.512327] check_noncircular+0x198/0x1b0 > [ 105.513462] __lock_acquire+0x105b/0x1c50 > [ 105.514675] lock_acquire+0xc4/0x1b0 > [ 105.515557] ? flush_workqueue+0x8a/0x550 > [ 105.516535] flush_workqueue+0xad/0x550 > [ 105.517587] ? flush_workqueue+0x8a/0x550 > [ 105.518530] ? drain_workqueue+0xb6/0x170 > [ 105.519518] drain_workqueue+0xb6/0x170 > [ 105.520572] destroy_workqueue+0x36/0x290 > [ 105.521658] cached_dev_free+0x45/0x1e0 [bcache] > [ 105.522879] process_one_work+0x243/0x5a0 > [ 105.523854] worker_thread+0x52/0x3c0 > [ 105.524734] ? process_one_work+0x5a0/0x5a0 > [ 105.525768] kthread+0x132/0x150 > [ 105.526594] ? __kthread_bind_mask+0x60/0x60 > [ 105.527706] ret_from_fork+0x3a/0x50 > [ 105.571043] bcache: bcache_device_free() bcache0 stopped > [ 105.730801] bcache: cache_set_free() Cache set > 74f01341-0881-4c77-a49b-39fafcabb99e unregistered > [ 105.730819] bcache: bcache_reboot() All devices stopped > [ 105.930055] reboot: Restarting system > [ 105.930996] reboot: machine restart > [ 105.932182] ACPI MEMORY or I/O RESET_REG. > > 2. After applied 7e865eba00a3 and this patch, no warning showed again. > [ 58.296057] bcache: bcache_reboot() Stopping all devices: > [ 58.337730] bcache: bcache_device_free() bcache0 stopped > [ 58.484177] bcache: cache_set_free() Cache set > 74f01341-0881-4c77-a49b-39fafcabb99e unregistered > [ 58.484202] bcache: bcache_reboot() All devices stopped > [ 58.731929] reboot: Restarting system > [ 58.732845] reboot: machine restart > [ 58.734044] ACPI MEMORY or I/O RESET_REG. Hi Lei, Nice, then it is proved your patch removes redundant code without regression. I will take this patch for 5.20, thank you for the extra effort. Coly Li