Re: [PATCH v5 03/14] md: make sure md_do_sync() will set MD_RECOVERY_DONE

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/02/18 16:41, Xiao Ni 写道:
On Sun, Feb 18, 2024 at 2:51 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/02/18 13:56, Xiao Ni 写道:
On Thu, Feb 1, 2024 at 5:30 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

From: Yu Kuai <yukuai3@xxxxxxxxxx>

stop_sync_thread() will interrupt md_do_sync(), and md_do_sync() must
set MD_RECOVERY_DONE, so that follow up md_check_recovery() will
unregister sync_thread, clear MD_RECOVERY_RUNNING and wake up
stop_sync_thread().

If MD_RECOVERY_WAIT is set or the array is read-only, md_do_sync() will
return without setting MD_RECOVERY_DONE, and after commit f52f5c71f3d4
("md: fix stopping sync thread"), dm-raid switch from
md_reap_sync_thread() to stop_sync_thread() to unregister sync_thread
from md_stop() and md_stop_writes(), causing the test
shell/lvconvert-raid-reshape.sh hang.

We shouldn't switch back to md_reap_sync_thread() because it's
problematic in the first place. Fix the problem by making sure
md_do_sync() will set MD_RECOVERY_DONE.

Reported-by: Mikulas Patocka <mpatocka@xxxxxxxxxx>
Closes: https://lore.kernel.org/all/ece2b06f-d647-6613-a534-ff4c9bec1142@xxxxxxxxxx/
Fixes: d5d885fd514f ("md: introduce new personality funciton start()")
Fixes: 5fd6c1dce06e ("[PATCH] md: allow checkpoint of recovery with version-1 superblock")
Fixes: f52f5c71f3d4 ("md: fix stopping sync thread")
Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx>
---
   drivers/md/md.c | 12 ++++++++----
   1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6906d023f1d6..c65dfd156090 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8788,12 +8788,16 @@ void md_do_sync(struct md_thread *thread)
          int ret;

          /* just incase thread restarts... */
-       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery) ||
-           test_bit(MD_RECOVERY_WAIT, &mddev->recovery))
+       if (test_bit(MD_RECOVERY_DONE, &mddev->recovery))
                  return;
-       if (!md_is_rdwr(mddev)) {/* never try to sync a read-only array */
+
+       if (test_bit(MD_RECOVERY_INTR, &mddev->recovery))
+               goto skip;
+
+       if (test_bit(MD_RECOVERY_WAIT, &mddev->recovery) ||
+           !md_is_rdwr(mddev)) {/* never try to sync a read-only array */
                  set_bit(MD_RECOVERY_INTR, &mddev->recovery);
-               return;
+               goto skip;
          }

Hi all

I have a question here. The codes above means if MD_RECOVERY_WAIT is
set, it sets MD_RECOVERY_INTR. If so, the sync thread can't happen.
But from the codes in md_start function:

                  set_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                  md_wakeup_thread(mddev->thread);
                  ret = mddev->pers->start(mddev);
                  clear_bit(MD_RECOVERY_WAIT, &mddev->recovery);
                  md_wakeup_thread(mddev->sync_thread);

MD_RECOVERY_WAIT means "it'll run sync thread later not interrupt it".
I guess this patch can introduce a new bug for raid5 journal?

I'm not sure what kind of problem you're talking about. After patch 4,
md_start_sync() should be the only place to register sync_thread, hence
md_start() should not see registered sync_thread. Perhaps
MD_RECOVERY_WAIT and md_wakeup_thread(mddev->sync_thread) can be removed
after patch 4?

Hi Kuai

Before this patch, the process is:
1. set MD_RECOVERY_WAIT
2. start sync thread, sync thread can't run until MD_RECOVERY_WAIT is cleared

Do you take a look at patch 4 and patch 9? sync thread will not start
before step 4 now.
3. do something
4. clear MD_RECOVERY_WAIT
5. sync thread (md_do_sync) can run

After this patch, step2 returns directly because MD_RECOVERY_INTR is
set. By this patch, MD_RECOVERY_WAIT has the same meaning as
MD_RECOVERY_INTR.  So this patch breaks one logic.

And nothing is broke here.

MD_RECOVERY_WAIT is introduced by patch
d5d885fd514fcebc9da5503c88aa0112df7514ef (md: introduce new
personality funciton start()). Then dm raid uses it to delay sync
thread too.

Back to the deadlock which this patch tries to fix.
The original report is reshape is stuck and can be reproduced easily
by these commands:
modprobe brd rd_size=34816 rd_nr=5
vgcreate test_vg /dev/ram*
lvcreate --type raid5 -L 16M -n test_lv test_vg
lvconvert -y --stripes 4 /dev/test_vg/test_lv
vgremove test_vg -ff


And can you still reporduce this problem after this patchset?

The root cause is that dm raid stopped the sync thread directly
before. It works even MD_RECOVERY_WAIT is set. Now we stop sync thread
asynchronously. Because MD_RECOVERY_WAIT is set, when stopping dm
raid, it can't set MD_RECOVERY_DONE in md_do_sync. It's the reason
it's stuck at stop_sync_thread. So to fix this deadlock, dm raid
should clear MD_RECOVERY_WAIT before stopping the sync thread.

Or are you saying that it's better to fix this problem this way? You
dind't explain that what's the problem to set MD_RECOVERY_DONE in
md_so_sync().

dm raid stop process:
1. dm_table_postsuspend_targets -> raid_postsuspend -> md_stop_writes.
2. dm_table_destroy -> raid_dtr

So we need to clear MD_RECOVERY_WAIT before calling md_stop_writes.



And to resolve this deadlock, we can use this patch:

--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -3796,8 +3796,10 @@ static void raid_postsuspend(struct dm_target *ti)
          struct raid_set *rs = ti->private;

          if (!test_and_set_bit(RT_FLAG_RS_SUSPENDED, &rs->runtime_flags)) {
+               if (test_bit(MD_RECOVERY_WAIT, &rs->md.recovery))
+                       clear_bit(MD_RECOVERY_WAIT, &rs->md.recovery);

You must make sure md_do_sync() is called after this if sync_thread is
already registered, and I don't understand yet how this is guranteed. :(

md_stop_writes -> __md_stop_writes -> stop_sync_thread guarantee this.

Best Regards
Xiao

Thanks,
Kuai


Regards
Xiao

          if (mddev_is_clustered(mddev)) {
--
2.39.2


.



.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux