On Wed, 17 Jan 2024, Song Liu wrote: > On Wed, Jan 17, 2024 at 10:19 AM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote: > > > > stop_sync_thread sets MD_RECOVERY_INTR and then waits for > > MD_RECOVERY_RUNNING to be cleared. However, md_do_sync will not clear > > MD_RECOVERY_RUNNING when exiting, it will set MD_RECOVERY_DONE instead. > > > > So, we must wait for MD_RECOVERY_DONE to be set as well. > > > > This patch fixes a deadlock in the LVM2 test shell/integrity-caching.sh. > > I am not able to reproduce the issue on 6.7 kernel with > shell/integrity-caching.sh. > I got: > > VERBOSE=0 ./lib/runner \ > --testdir . --outdir results \ > --flavours ndev-vanilla --only shell/integrity-caching.sh --skip @ > running 1 tests > ### passed: [ndev-vanilla] shell/integrity-caching.sh 4:24.225 > > ### 1 tests: 1 passed, 0 skipped, 0 timed out, 0 warned, 0 failed in 4:24.453 > make[1]: Leaving directory '/root/lvm2/test' > > Do you see the issue every time with shell/integrity-caching.sh? Hmm, that's strange - I get a hang with this stacktrace sometimes instantly, sometimes in 30 seconds. I test it on the current kernel from Linus' git - 052d534373b7ed33712a63d5e17b2b6cdbce84fd. Mikulas > Thanks, > Song > > > > > sysrq: Show Blocked State > > task:lvm state:D stack:0 pid:11422 tgid:11422 ppid:1374 flags:0x00004002 > > Call Trace: > > <TASK> > > __schedule+0x228/0x570 > > schedule+0x29/0xa0 > > schedule_timeout+0x6a/0xd0 > > ? timer_shutdown_sync+0x10/0x10 > > stop_sync_thread+0x141/0x180 [md_mod] > > ? housekeeping_test_cpu+0x30/0x30 > > __md_stop_writes+0x10/0xd0 [md_mod] > > md_stop+0x9/0x20 [md_mod] > > raid_dtr+0x1e/0x60 [dm_raid] > > dm_table_destroy+0x53/0x110 [dm_mod] > > __dm_destroy+0x10b/0x1e0 [dm_mod] > > ? table_clear+0xa0/0xa0 [dm_mod] > > dev_remove+0xd4/0x110 [dm_mod] > > ctl_ioctl+0x2e1/0x570 [dm_mod] > > dm_ctl_ioctl+0x5/0x10 [dm_mod] > > __x64_sys_ioctl+0x85/0xa0 > > do_syscall_64+0x5d/0x1a0 > > entry_SYSCALL_64_after_hwframe+0x46/0x4e > > > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> > > Cc: stable@xxxxxxxxxxxxxxx # v6.7 > > Fixes: 130443d60b1b ("md: refactor idle/frozen_sync_thread() to fix deadlock") > > > > --- > > drivers/md/md.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > Index: linux-2.6/drivers/md/md.c > > =================================================================== > > --- linux-2.6.orig/drivers/md/md.c > > +++ linux-2.6/drivers/md/md.c > > @@ -4881,7 +4881,8 @@ static void stop_sync_thread(struct mdde > > if (check_seq) > > sync_seq = atomic_read(&mddev->sync_seq); > > > > - if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) { > > + if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || > > + test_bit(MD_RECOVERY_DONE, &mddev->recovery)) { > > if (!locked) > > mddev_unlock(mddev); > > return; > > @@ -4901,6 +4902,7 @@ retry: > > > > if (!wait_event_timeout(resync_wait, > > !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || > > + test_bit(MD_RECOVERY_DONE, &mddev->recovery) || > > (check_seq && sync_seq != atomic_read(&mddev->sync_seq)), > > HZ / 10)) > > goto retry; > > >