On Sat, Nov 14, 2020 at 8:30 PM Zhao Heming <heming.zhao@xxxxxxxx> wrote: > [...] > > Signed-off-by: Zhao Heming <heming.zhao@xxxxxxxx> The fix makes sense to me. But I really hope we can improve the commit log. I have made some changes to it with a couple TODOs for you (see below). Please read it, fill the TODOs, and revise 2/2. Thanks, Song md/cluster: block reshape with remote resync job Reshape request should be blocked with ongoing resync job. In cluster env, a node can start resync job even if the resync cmd isn't executed on it, e.g., user executes "mdadm --grow" on node A, sometimes node B will start resync job. However, current update_raid_disks() only check local recovery status, which is incomplete. As a result, we see (TODO describe observed issue). Fix this issue by blocking reshape request. When node executes "--grow" and detects ongoing resync, it should stop and report error to user. The following script reproduces the issue with (TODO: ???%) probability. ``` # on node1, node2 is the remote node. mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh" sleep 5 mdadm --manage --add /dev/md0 /dev/sdi mdadm --wait /dev/md0 mdadm --grow --raid-devices=3 /dev/md0 mdadm /dev/md0 --fail /dev/sdg mdadm /dev/md0 --remove /dev/sdg mdadm --grow --raid-devices=2 /dev/md0 ``` Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Zhao Heming <heming.zhao@xxxxxxxx> > --- > drivers/md/md.c | 8 ++++++-- > 1 file changed, 6 insertions(+), 2 deletions(-) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 98bac4f304ae..74280e353b8f 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c [...]