On Thu, Feb 29, 2024 at 4:49 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > On Fri, Mar 1, 2024 at 7:46 AM Song Liu <song@xxxxxxxxxx> wrote: > > > > On Thu, Feb 29, 2024 at 2:53 PM Song Liu <song@xxxxxxxxxx> wrote: > > > > > > On Thu, Feb 29, 2024 at 7:50 AM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > > > > > This reverts commit 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6. > > > > > > > > The root cause is that MD_RECOVERY_WAIT isn't cleared when stopping raid. > > > > The following patch 'Clear MD_RECOVERY_WAIT when stopping dmraid' fixes > > > > this problem. > > > > > > > > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx> > > > > > > I think we still need 82ec0ae59d02e89164b24c0cc8e4e50de78b5fd6 or some > > > variation of it. Otherwise, we may hit the following deadlock. The test vm here > > > has 2 raid arrays: one raid5 with journal, and a raid1. > > > > > > I pushed other patches in the set to the md-6.9-for-hch branch for > > > further tests. > > > > Actually, it appears md-6.9-for-hch branch still has this problem. Let me test > > more.. > > > > Song > > > > Hi Song > > What are the commands you use for testing? Can you reproduce it with > the 6.6 kernel? The VM has these two arrays assembled automatically on boot. I can repro the issue by simply reboot the VM (which triggers stop array on both). So the repro is basically rebooting the array in a loop via ssh. For this branch, https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/log/?h=md-6.9-for-hch which has 5 of the 6 patches in these set, I can reproduce the issue. This issue doesn't happen on commit aee93ec0ec79, which is before this set. Song