Re: [PATCH RFC] md: call md_cluster_stop() in __md_stop()

Guoqing Jiang <guoqing.jiang@xxxxxxxxx> · Mon, 15 Aug 2022 14:19:13 +0800

Hi Neil,

Sorry for later reply since I was on vacation last week.

On 8/12/22 10:16 AM, NeilBrown wrote:
[[ I noticed the e151 patch recently which seems to admit that it broke
    something.  So I looked into it and came up with this.

I just noticed it ...

    It seems to make sense, but I'm not in a position to test it.
    Guoqing: does it look OK to you?
    - NeilBrown
]]

As described in Commit: 48df498daf62 ("md: move bitmap_destroy to the
beginning of __md_stop") md_cluster_stop() needs to run before the
mdddev->thread is stopped.
The change to make this happen was reverted in Commit: e151db8ecfb0
("md-raid: destroy the bitmap after destroying the thread") due to
problems it caused.

To restore correct behaviour, make md_cluster_stop() reentrant and
explicitly call it at the start of __md_stop(), after first calling
md_bitmap_wait_behind_writes().

Fixes: e151db8ecfb0 ("md-raid: destroy the bitmap after destroying the thread")
Signed-off-by: NeilBrown <neilb@xxxxxxx>
---
  drivers/md/md-cluster.c | 1 +
  drivers/md/md.c         | 3 +++
  2 files changed, 4 insertions(+)

diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c
index 742b2349fea3..37bf0aa4ed71 100644
--- a/drivers/md/md-cluster.c
+++ b/drivers/md/md-cluster.c
@@ -1009,6 +1009,7 @@ static int leave(struct mddev *mddev)
  	     test_bit(MD_CLOSING, &mddev->flags)))
  		resync_bitmap(mddev);
  
+	mddev->cluster_info = NULL;

The above makes sense.

  	set_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state);
  	md_unregister_thread(&cinfo->recovery_thread);
  	md_unregister_thread(&cinfo->recv_thread);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..a57b2dff64dd 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6238,6 +6238,9 @@ static void mddev_detach(struct mddev *mddev)
  static void __md_stop(struct mddev *mddev)
  {
  	struct md_personality *pers = mddev->pers;
+
+	md_bitmap_wait_behind_writes(mddev);
+	md_cluster_stop(mddev);
  	mddev_detach(mddev);
  	/* Ensure ->event_work is done */
  	if (mddev->event_work.func)

The md_bitmap_destroy is called in __md_stop with or without e151db8ecfb0,
and it already invokes md_bitmap_wait_behind_writes and md_cluster_stop by
md_bitmap_free. So the above is sort of redundant to me.

For the issue described in e151db8ecfb, looks like raid1d was running after
__md_stop, I am wondering if dm-raid should stop write first just like 
normal
md-raid.

diff --git a/drivers/md/md.c b/drivers/md/md.c
index afaf36b2f6ab..afc8d638eba0 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -6260,6 +6260,7 @@ void md_stop(struct mddev *mddev)
        /* stop the array and free an attached data structures.
         * This is called from dm-raid
         */
+       __md_stop_writes(mddev);
        __md_stop(mddev);
        bioset_exit(&mddev->bio_set);
        bioset_exit(&mddev->sync_set);

Thanks,
Guoqing