Re: [PATCH 1/2] md/cluster: reshape should returns error when remote doing resyncing job

"heming.zhao@xxxxxxxx" <heming.zhao@xxxxxxxx> · Tue, 10 Nov 2020 14:59:30 +0800

On 11/10/20 2:38 PM, Guoqing Jiang wrote:
> 
> 
> On 11/8/20 15:53, Zhao Heming wrote:
>> Test script (reproducible steps):
>> ```
>> ssh root@node2 "mdadm -S --scan"
>> mdadm -S --scan
>> mdadm --zero-superblock /dev/sd{g,h,i}
>> for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
>> count=20; done
>>
>> echo "mdadm create array"
>> mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh
>> echo "set up array on node2"
>> ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
>>
>> sleep 5
>>
>> mdadm --manage --add /dev/md0 /dev/sdi
>> mdadm --wait /dev/md0
>> mdadm --grow --raid-devices=3 /dev/md0
>>
>> mdadm /dev/md0 --fail /dev/sdg
>> mdadm /dev/md0 --remove /dev/sdg
>>   #mdadm --wait /dev/md0
>> mdadm --grow --raid-devices=2 /dev/md0
>> ```
>>
> 
> What is the result after the above steps? Deadlock or something else.

The result was writen in cover-letter, in the "*** error behavior ***".
I will add the result as comments in V2 patch.

> 
>> node A & B share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB, and
>> the disk size is more large the issue is more likely to trigger. (more
>> resync time, more easily trigger issues)
>>
>> There is a workaround:
>> when adding the --wait before second --grow, the issue 1 will disappear.
>>
>> ... ...
>> +        if (ret)
>> +            pr_warn("md: updating array disks failed. %d\n", ret);
>> +    }
>>       /*
>>        * Since mddev->delta_disks has already updated in update_raid_disks,
>>
> 
> Generally, I think it is good.
> 
> Thanks,
> Guoqing
>