Re: [PATCH 1/2] md/cluster: reshape should returns error when remote doing resyncing job

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 11/8/20 15:53, Zhao Heming wrote:
Test script (reproducible steps):
```
ssh root@node2 "mdadm -S --scan"
mdadm -S --scan
mdadm --zero-superblock /dev/sd{g,h,i}
for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
count=20; done

echo "mdadm create array"
mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
echo "set up array on node2"
ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"

sleep 5

mdadm --manage --add /dev/md0 /dev/sdi
mdadm --wait /dev/md0
mdadm --grow --raid-devices=3 /dev/md0

mdadm /dev/md0 --fail /dev/sdg
mdadm /dev/md0 --remove /dev/sdg
  #mdadm --wait /dev/md0
mdadm --grow --raid-devices=2 /dev/md0
```


What is the result after the above steps? Deadlock or something else.

node A & B share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB, and
the disk size is more large the issue is more likely to trigger. (more
resync time, more easily trigger issues)

There is a workaround:
when adding the --wait before second --grow, the issue 1 will disappear.

Rootcause:

In cluster env, every node can start resync job even if the resync cmd
doesn't execute on it.
e.g.
There are two node A & B. User executes "mdadm --grow" on A, sometime B
will start resync job not A.

Current update_raid_disks() only check local recovery status, it's
incomplete.

How to fix:

The simple & clear solution is block the reshape action in initiator
side. When node is executing "--grow" and detecting there is ongoing
resyncing, it should immediately return & report error to user space.

Signed-off-by: Zhao Heming <heming.zhao@xxxxxxxx>
---
  drivers/md/md.c | 8 ++++++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 98bac4f304ae..74280e353b8f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7278,6 +7278,7 @@ static int update_raid_disks(struct mddev *mddev, int raid_disks)
  		return -EINVAL;
  	if (mddev->sync_thread ||
  	    test_bit(MD_RECOVERY_RUNNING, &mddev->recovery) ||
+		test_bit(MD_RESYNCING_REMOTE, &mddev->recovery) ||
  	    mddev->reshape_position != MaxSector)
  		return -EBUSY;
@@ -9662,8 +9663,11 @@ static void check_sb_changes(struct mddev *mddev, struct md_rdev *rdev)
  		}
  	}
- if (mddev->raid_disks != le32_to_cpu(sb->raid_disks))
-		update_raid_disks(mddev, le32_to_cpu(sb->raid_disks));
+	if (mddev->raid_disks != le32_to_cpu(sb->raid_disks)) {
+		ret = update_raid_disks(mddev, le32_to_cpu(sb->raid_disks));
+		if (ret)
+			pr_warn("md: updating array disks failed. %d\n", ret);
+	}
/*
  	 * Since mddev->delta_disks has already updated in update_raid_disks,


Generally, I think it is good.

Thanks,
Guoqing



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux