Re: [PATCH -next 0/9] dm-raid, md/raid: fix v6.7 regressions part2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/03/04 19:06, Xiao Ni 写道:
On Mon, Mar 4, 2024 at 4:27 PM Xiao Ni <xni@xxxxxxxxxx> wrote:

On Mon, Mar 4, 2024 at 9:25 AM Xiao Ni <xni@xxxxxxxxxx> wrote:

On Mon, Mar 4, 2024 at 9:24 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/03/04 9:07, Yu Kuai 写道:
Hi,

在 2024/03/03 21:16, Xiao Ni 写道:
Hi all

There is a error report from lvm regression tests. The case is
lvconvert-raid-reshape-stripes-load-reload.sh. I saw this error when I
tried to fix dmraid regression problems too. In my patch set,  after
reverting ad39c08186f8a0f221337985036ba86731d6aafe (md: Don't register
sync_thread for reshape directly), this problem doesn't appear.


Hi Kuai
How often did you see this tes failed? I'm running the tests for over
two days now, for 30+ rounds, and this test never fail in my VM.

I ran 5 times and it failed 2 times just now.


Take a quick look, there is still a path from raid10 that
MD_RECOVERY_FROZEN can be cleared, and in theroy this problem can be
triggered. Can you test the following patch on the top of this set?
I'll keep running the test myself.

Sure, I'll give the result later.

Hi all

It's not stable to reproduce this. After applying this raid10 patch it
failed once 28 times. Without the raid10 patch, it failed once 30
times, but it failed frequently this morning.

Hi all

After running 152 times with kernel 6.6, the problem can appear too.
So it can return the state of 6.6. This patch set can make this
problem appear quickly.

I verified in my VM that after testing 100+ times, this problem can both
triggered with v6.6 and v6.8-rc5 + this set.

I think we can merge this patchset, and figure out why the test can fail
later.

Thanks,
Kuai



Best Regards
Xiao



Regards
Xiao

Regards
Xiao

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a5f8419e2df1..7ca29469123a 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -4575,7 +4575,8 @@ static int raid10_start_reshape(struct mddev *mddev)
          return 0;

   abort:
-       mddev->recovery = 0;
+       if (mddev->gendisk)
+               mddev->recovery = 0;
          spin_lock_irq(&conf->device_lock);
          conf->geo = conf->prev;
          mddev->raid_disks = conf->geo.raid_disks;

Thanks,
Kuai

Thanks,
Kuai


I put the log in the attachment.

On Fri, Mar 1, 2024 at 6:03 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

link to part1:
https://lore.kernel.org/all/CAPhsuW7u1UKHCDOBDhD7DzOVtkGemDz_QnJ4DUq_kSN-Q3G66Q@xxxxxxxxxxxxxx/


part1 contains fixes for deadlocks for stopping sync_thread

This set contains fixes:
   - reshape can start unexpected, cause data corruption, patch 1,5,6;
   - deadlocks that reshape concurrent with IO, patch 8;
   - a lockdep warning, patch 9;

I'm runing lvm2 tests with following scripts with a few rounds now,

for t in `ls test/shell`; do
          if cat test/shell/$t | grep raid &> /dev/null; then
                  make check T=shell/$t
          fi
done

There are no deadlock and no fs corrupt now, however, there are still
four
failed tests:

###       failed: [ndev-vanilla] shell/lvchange-raid1-writemostly.sh
###       failed: [ndev-vanilla] shell/lvconvert-repair-raid.sh
###       failed: [ndev-vanilla] shell/lvcreate-large-raid.sh
###       failed: [ndev-vanilla] shell/lvextend-raid.sh

And failed reasons are the same:

## ERROR: The test started dmeventd (147856) unexpectedly

I have no clue yet, and it seems other folks doesn't have this issue.

Yu Kuai (9):
    md: don't clear MD_RECOVERY_FROZEN for new dm-raid until resume
    md: export helpers to stop sync_thread
    md: export helper md_is_rdwr()
    md: add a new helper reshape_interrupted()
    dm-raid: really frozen sync_thread during suspend
    md/dm-raid: don't call md_reap_sync_thread() directly
    dm-raid: add a new helper prepare_suspend() in md_personality
    dm-raid456, md/raid456: fix a deadlock for dm-raid456 while io
      concurrent with reshape
    dm-raid: fix lockdep waring in "pers->hot_add_disk"

   drivers/md/dm-raid.c | 93 ++++++++++++++++++++++++++++++++++----------
   drivers/md/md.c      | 73 ++++++++++++++++++++++++++--------
   drivers/md/md.h      | 38 +++++++++++++++++-
   drivers/md/raid5.c   | 32 ++++++++++++++-
   4 files changed, 196 insertions(+), 40 deletions(-)

--
2.39.2



.



.






[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux