Hello Song, OK, I will add a cover letter with more descriptions & resend these patches. Though the test scripts almost same, there are two different bugs. patch 1/2 fixes --grow wrong behaviour. (the bug happened after second --grow cmd executing) patch 2/2 fixes md-cluster deadlock. (the deadlock happened before second --grow cmd) The patch 1/2 bug was came from one of SUSE customers. When I finished bugfix and ran test script to verify, I triggered patch 2/2 bug. test script of patch 2/2 adds "--bitmap-chunk=1M" in creating mdadm & mkfs.xfs after setup array. These two steps make array to do more resync work. More resync time give lager time window (more opportunities) to trigger deadlock. Thanks. On 11/7/20 8:17 AM, Song Liu wrote: > On Thu, Nov 5, 2020 at 5:11 AM Zhao Heming <heming.zhao@xxxxxxxx> wrote: >> >> Test script (reproducible steps): >> ``` >> ssh root@node2 "mdadm -S --scan" >> mdadm -S --scan >> mdadm --zero-superblock /dev/sd{g,h,i} >> for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \ >> count=20; done >> >> echo "mdadm create array" >> mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh >> echo "set up array on node2" >> ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh" >> >> sleep 5 >> >> mdadm --manage --add /dev/md0 /dev/sdi >> mdadm --wait /dev/md0 >> mdadm --grow --raid-devices=3 /dev/md0 >> >> mdadm /dev/md0 --fail /dev/sdg >> mdadm /dev/md0 --remove /dev/sdg >> #mdadm --wait /dev/md0 >> mdadm --grow --raid-devices=2 /dev/md0 >> ``` > > I found it was hard for me to follow this set. IIUC, the two patches try to > address one issue. Please add a cover letter and reorganize the descriptions > like: > > cover-letter: error behavior, repro steps, analysis, and maybe describe the > relationship of the two patches. > 1/2 and 2/2: what is being fixed. > > Thanks, > Song > > [...] >