Re: [PATCH 2/6] md: Revert "md: Make sure md_do_sync() will set MD_RECOVERY_DONE"

Song Liu <song@xxxxxxxxxx> · Thu, 29 Feb 2024 17:11:57 -0800

On Thu, Feb 29, 2024 at 4:49 PM Xiao Ni <xni@xxxxxxxxxx> wrote:
>
> On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@xxxxxxxxxx> wrote:
> >
> > On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@xxxxxxxxxx> wrote:
> > > >
> > > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6.
> > > >
> > > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid.
> > > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes
> > > > this problem.
> > > >
> > > > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
> > >
> > > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some
> > > variation of it. Otherwise, we may hit the following deadlock. The test vm here
> > > has 2 raid arrays: one raid5 with journal, and a raid1.
> > >
> > > I pushed other patches in the set to the md-6.9-for-hch branch for
> > > further tests.
> >
> > Actually, it appears md-6.9-for-hch branch still has this problem. Let me test
> > more..
> >
> > Song
> >
>
> Hi Song
>
> What are the commands you use for testing? Can you reproduce it with
> the 6.6 kernel?

The VM has these two arrays assembled automatically on boot. I can repro
the issue by simply reboot the VM (which triggers stop array on both). So
the repro is basically rebooting the array in a loop via ssh.

For this branch,

https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch

which has 5 of the 6 patches in these set, I can reproduce the issue. This issue
doesn't happen on commit aee93ec0ec79, which is before this set.

Song