On 11/10/20 2:38 PM, Guoqing Jiang wrote: > > > On 11/8/20 15:53, Zhao Heming wrote: >> Test script (reproducible steps): >> ``` >> ssh root@node2 "mdadm -S --scan" >> mdadm -S --scan >> mdadm --zero-superblock /dev/sd{g,h,i} >> for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \ >> count=20; done >> >> echo "mdadm create array" >> mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh >> echo "set up array on node2" >> ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh" >> >> sleep 5 >> >> mdadm --manage --add /dev/md0 /dev/sdi >> mdadm --wait /dev/md0 >> mdadm --grow --raid-devices=3 /dev/md0 >> >> mdadm /dev/md0 --fail /dev/sdg >> mdadm /dev/md0 --remove /dev/sdg >> #mdadm --wait /dev/md0 >> mdadm --grow --raid-devices=2 /dev/md0 >> ``` >> > > What is the result after the above steps? Deadlock or something else. The result was writen in cover-letter, in the "*** error behavior ***". I will add the result as comments in V2 patch. > >> node A & B share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB, and >> the disk size is more large the issue is more likely to trigger. (more >> resync time, more easily trigger issues) >> >> There is a workaround: >> when adding the --wait before second --grow, the issue 1 will disappear. >> >> ... ... >> + if (ret) >> + pr_warn("md: updating array disks failed. %d\n", ret); >> + } >> /* >> * Since mddev->delta_disks has already updated in update_raid_disks, >> > > Generally, I think it is good. > > Thanks, > Guoqing >