Re: [PATCH 3/7] md: test for MD_RECOVERY_DONE in stop_sync_thread

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Thu, 18 Jan 2024 14:23:58 +0100 (CET)

On Wed, 17 Jan 2024, Song Liu wrote:

> On Wed, Jan 17, 2024 at 10:19 AM Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> >
> > stop_sync_thread sets MD_RECOVERY_INTR and then waits for
> > MD_RECOVERY_RUNNING to be cleared. However, md_do_sync will not clear
> > MD_RECOVERY_RUNNING when exiting, it will set MD_RECOVERY_DONE instead.
> >
> > So, we must wait for MD_RECOVERY_DONE to be set as well.
> >
> > This patch fixes a deadlock in the LVM2 test shell/integrity-caching.sh.
> 
> I am not able to reproduce the issue on 6.7 kernel with
> shell/integrity-caching.sh.
> I got:
> 
> VERBOSE=0 ./lib/runner \
>         --testdir . --outdir results \
>         --flavours ndev-vanilla --only shell/integrity-caching.sh --skip @
> running 1 tests
> ###       passed: [ndev-vanilla] shell/integrity-caching.sh  4:24.225
> 
> ### 1 tests: 1 passed, 0 skipped, 0 timed out, 0 warned, 0 failed   in  4:24.453
> make[1]: Leaving directory '/root/lvm2/test'
> 
> Do you see the issue every time with shell/integrity-caching.sh?

Hmm, that's strange - I get a hang with this stacktrace sometimes 
instantly, sometimes in 30 seconds. I test it on the current kernel from 
Linus' git - 052d534373b7ed33712a63d5e17b2b6cdbce84fd.

Mikulas

> Thanks,
> Song
> 
> >
> > sysrq: Show Blocked State
> > task:lvm             state:D stack:0     pid:11422  tgid:11422 ppid:1374   flags:0x00004002
> > Call Trace:
> >  <TASK>
> >  __schedule+0x228/0x570
> >  schedule+0x29/0xa0
> >  schedule_timeout+0x6a/0xd0
> >  ? timer_shutdown_sync+0x10/0x10
> >  stop_sync_thread+0x141/0x180 [md_mod]
> >  ? housekeeping_test_cpu+0x30/0x30
> >  __md_stop_writes+0x10/0xd0 [md_mod]
> >  md_stop+0x9/0x20 [md_mod]
> >  raid_dtr+0x1e/0x60 [dm_raid]
> >  dm_table_destroy+0x53/0x110 [dm_mod]
> >  __dm_destroy+0x10b/0x1e0 [dm_mod]
> >  ? table_clear+0xa0/0xa0 [dm_mod]
> >  dev_remove+0xd4/0x110 [dm_mod]
> >  ctl_ioctl+0x2e1/0x570 [dm_mod]
> >  dm_ctl_ioctl+0x5/0x10 [dm_mod]
> >  __x64_sys_ioctl+0x85/0xa0
> >  do_syscall_64+0x5d/0x1a0
> >  entry_SYSCALL_64_after_hwframe+0x46/0x4e
> >
> > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> > Cc: stable@xxxxxxxxxxxxxxx      # v6.7
> > Fixes: 130443d60b1b ("md: refactor idle/frozen_sync_thread() to fix deadlock")
> >
> > ---
> >  drivers/md/md.c |    4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > Index: linux-2.6/drivers/md/md.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/md.c
> > +++ linux-2.6/drivers/md/md.c
> > @@ -4881,7 +4881,8 @@ static void stop_sync_thread(struct mdde
> >         if (check_seq)
> >                 sync_seq = atomic_read(&mddev->sync_seq);
> >
> > -       if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery)) {
> > +       if (!test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> > +           test_bit(MD_RECOVERY_DONE, &mddev->recovery)) {
> >                 if (!locked)
> >                         mddev_unlock(mddev);
> >                 return;
> > @@ -4901,6 +4902,7 @@ retry:
> >
> >         if (!wait_event_timeout(resync_wait,
> >                    !test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
> > +                  test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> >                    (check_seq && sync_seq != atomic_read(&mddev->sync_seq)),
> >                    HZ / 10))
> >                 goto retry;
> >
>