Re: [PATCH v5 03/14] md: make sure md_do_sync() will set MD_RECOVERY_DONE

Xiao Ni <xni@xxxxxxxxxx> · Sun, 18 Feb 2024 16:41:27 +0800

On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> 在 2024/02/18 13:56, Xiao Ni 写道:
> > On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:
> >>
> >> From: Yu Kuai <yukuai3@xxxxxxxxxx>
> >>
> >> stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
> >> set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
> >> unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
> >> stop_sync_thread().
> >>
> >> If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
> >> return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
> >> ("md: fix stopping sync thread"), dm-raid switch from
> >> md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
> >> from md_stop() and md_stop_writes(), causing the test
> >> shell/lvconvert-raid-reshape.sh hang.
> >>
> >> We shouldn't switch back to md_reap_sync_thread() because it's
> >> problematic in the first place. Fix the problem by making sure
> >> md_do_sync() will set MD_RECOVERY_DONE.
> >>
> >> Reported-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
> >> Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@xxxxxxxxxx/
> >> Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
> >> Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
> >> Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
> >> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
> >> ---
> >>   drivers/md/md.c | 12 ++++++++----
> >>   1 file changed, 8 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 6906d023f1d6..c65dfd156090 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
> >>          int ret;
> >>
> >>          /* just incase thread restarts... */
> >> -       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
> >> -           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
> >> +       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
> >>                  return;
> >> -       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> >> +
> >> +       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
> >> +               goto skip;
> >> +
> >> +       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
> >> +           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
> >>                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
> >> -               return;
> >> +               goto skip;
> >>          }
> >
> > Hi all
> >
> > I have a question here. The codes above means if MD_RECOVERY_WAIT is
> > set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
> > But from the codes in md_start function:
> >
> >                  set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> >                  md_wakeup_thread(mddev->thread);
> >                  ret = mddev->pers->start(mddev);
> >                  clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
> >                  md_wakeup_thread(mddev->sync_thread);
> >
> > MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
> > I guess this patch can introduce a new bug for raid5 journal?
>
> I'm not sure what kind of problem you're talking about. After patch 4,
> md_start_sync() should be the only place to register sync_thread, hence
> md_start() should not see registered sync_thread. Perhaps
> MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed
> after patch 4?

Hi Kuai

Before this patch, the process is:
1. set MD_RECOVERY_WAIT
2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared
3. do something
4. clear MD_RECOVERY_WAIT
5. sync thread (md_do_sync) can run

After this patch, step2 returns directly because MD_RECOVERY_INTR is
set. By this patch, MD_RECOVERY_WAIT has the same meaning as
MD_RECOVERY_INTR.  So this patch breaks one logic.

MD_RECOVERY_WAIT is introduced by patch
d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new
personality funciton start()). Then dm raid uses it to delay sync
thread too.

Back to the deadlock which this patch tries to fix.
The original report is reshape is stuck and can be reproduced easily
by these commands:
modprobe brd rd_size=34816 rd_nr=5
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff

The root cause is that dm raid stopped the sync thread directly
before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread
asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm
raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason
it's stuck at stop_sync_thread. So to fix this deadlock, dm raid
should clear MD_RECOVERY_WAIT before stopping the sync thread.

dm raid stop process:
1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes.
2. dm_table_destroy -> raid_dtr

So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes.

>
> >
> > And to resolve this deadlock, we can use this patch:
> >
> > --- a/drivers/md/dm-raid.c
> > +++ b/drivers/md/dm-raid.c
> > @@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
> >          struct raid_set *rs = ti->private;
> >
> >          if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
> > +               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
> > +                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);
>
> You must make sure md_do_sync() is called after this if sync_thread is
> already registered, and I don't understand yet how this is guranteed. :(

md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this.

Best Regards
Xiao
>
> Thanks,
> Kuai
>
> >
> > Regards
> > Xiao
> >>
> >>          if (mddev_is_clustered(mddev)) {
> >> --
> >> 2.39.2
> >>
> >
> > .
> >
>