Re: [PATCH RFC 1/1] md: Use pers->quiesce in mddev_suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

在 2024/11/05 17:01, Xiao Ni 写道:
On Tue, Nov 5, 2024 at 4:29 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

Hi,

在 2024/11/05 16:16, Mariusz Tkaczyk 写道:
On Tue,  5 Nov 2024 15:57:33 +0800
Xiao Ni <xni@xxxxxxxxxx> wrote:

One customer reports a bug: raid5 is hung when changing thread cnt
while resync is running. The stripes are all in conf->handle_list
and new threads can't handle them.

Is issue fixed with this patch? Is is missing here :)

Commit b39f35ebe86d ("md: don't quiesce in mddev_suspend()") removes
pers->quiesce from mddev_suspend/resume, then we can't guarantee sync
requests finish in suspend operation. One personality knows itself the
best. So pers->quiesce is a proper way to let personality quiesce.


Fixes: b39f35ebe86d ("md: don't quiesce in mddev_suspend()")
Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
---
   drivers/md/md.c | 6 ++++++
   1 file changed, 6 insertions(+)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 67108c397c5a..7409ecb2df68 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -482,6 +482,9 @@ int mddev_suspend(struct mddev *mddev, bool interruptible)
              return err;
      }

+    if (mddev->pers)
+            mddev->pers->quiesce(mddev, 1);
+

Shouldn't it be implemented as below? According to b39f35ebe86d, some levels are
not implementing this?

+     if (mddev->pers && mddev->pers->quiesce)
+             mddev->pers->quiesce(mddev, 1);

It's fine, the fops is never NULL, just some levels points to an empty
function.

The tricky part here is that accessing "mddev->pers" is not safe here,
'reconfig_mutex' is not held in mddev_suspend(), and can concurrent
with, for example, change levels. :(

Hi Kuai

Now mddev->suspended is protected by mddev->suspend_mutex. It should
can avoid the problem you mentioned above? level_store calls
mddev_suspend and mddev->suspended is set to mddev->suspended+1. So
other paths will return because mddev->suspended is not 0.

Yeah, level store is just an wrong example, key point here is that
mddev->pers is not protected by 'suspend_mutex'. Other places are
md_run() and md_stop(), these path doesn't hold 'suspend_mutex' right?
And they look like can concurrent with suspend, for example,
suspend_lo_store()

Did you consider just fail raid5_store_group_thread_cnt() if sync_thread
is active?

Or call ->quiesce() directly here with 'reconfig_mutex' held before
suspend?

Thanks,
Kuai


Regards
Xiao

Thanks,
Kuai


Is it reproducible with upstream kernel?

Thanks,
Mariusz

.




.






[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux