On Sun, Nov 8, 2020 at 6:53 AM Zhao Heming <heming.zhao@xxxxxxxx> wrote: > [...] > How to fix: > > The simple & clear solution is block the reshape action in initiator > side. When node is executing "--grow" and detecting there is ongoing > resyncing, it should immediately return & report error to user space. > > Signed-off-by: Zhao Heming <heming.zhao@xxxxxxxx> The code looks good to me. But please revise the commit log as something similar to the following. ========================== 8< ========================== md/cluster: block reshape requests with resync job initiated from remote node In cluster env, a node can start resync job when the resync cmd was executed on a different node. Reshape requests should be blocked for resync job initiated by any node. Current code only <condition to block reshape requests>. This results in a dead lock in <condition> (see repro steps below). Fix this by <adding the extra check>. Repro steps: ... ========================== 8< ========================== In this way, whoever reading the commit log, which could be yourself in 2021, will understand the primary goal of this change quickly. Does this make sense? Thanks, Song > --- > drivers/md/md.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 98bac4f304ae..74280e353b8f 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -7278,6 +7278,7 @@ static int update_raid_disks(struct mddev *mddev, int raid_disks) > return -EINVAL; > if (mddev->sync_thread || > test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) || > + test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) || > mddev->reshape_position != MaxSector) > return -EBUSY; > > @@ -9662,8 +9663,11 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev) > } > } > > - if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) > - update_raid_disks(mddev, le32_to_cpu(sb->raid_disks)); > + if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) { > + ret = update_raid_disks(mddev, le32_to_cpu(sb->raid_disks)); > + if (ret) > + pr_warn("md: updating array disks failed. %d\n", ret); > + } > > /* > * Since mddev->delta_disks has already updated in update_raid_disks, > -- > 2.27.0 >