Re: [PATCH v4 0/2] md/cluster bugs fix

"heming.zhao@xxxxxxxx" <heming.zhao@xxxxxxxx> · Thu, 19 Nov 2020 19:43:56 +0800

I resend the v4 patch with correct Cc tag.

On 11/19/20 7:41 PM, Zhao Heming wrote:
> Hello List,
> 
> There are two patches to fix md-cluster bugs.
> 
> The 2 different bugs can use same test script to trigger:
> 
> ```
> ssh root@node2 "mdadm -S --scan"
> mdadm -S --scan
> for i in {g,h,i};do dd if=/dev/zero of=/dev/sd$i oflag=direct bs=1M \
> count=20; done
> 
> echo "mdadm create array"
> mdadm -C /dev/md0 -b clustered -e 1.2 -n 2 -l mirror /dev/sdg /dev/sdh \
> --bitmap-chunk=1M
> echo "set up array on node2"
> ssh root@node2 "mdadm -A /dev/md0 /dev/sdg /dev/sdh"
> 
> sleep 5
> 
> mkfs.xfs /dev/md0
> mdadm --manage --add /dev/md0 /dev/sdi
> mdadm --wait /dev/md0
> mdadm --grow --raid-devices=3 /dev/md0
> 
> mdadm /dev/md0 --fail /dev/sdg
> mdadm /dev/md0 --remove /dev/sdg
> mdadm --grow --raid-devices=2 /dev/md0
> ```
> 
> For detail, please check each patch commit log.
> 
> -------
> v4:
> - revise subject & commit log on both patches
> - no change for code
> v3:
> - patch 1/2
>    - no change
> - patch 2/2
>    - use Xiao's solution to fix
>    - revise commit log for the "How to fix" part
> v2:
> - patch 1/2
>    - change patch subject
>    - add test result in commit log
>    - no change for code
> - patch 2/2
>    - add test result in commit log
>    - add error handling of remove_disk in hot_remove_disk
>    - add error handling of lock_comm in all caller
>    - remove 5s timeout fix in receive side (for process_metadata_update)
> v1:
> - create patch
> -------
> Zhao Heming (2):
>    md/cluster: block reshape with remote resync job
>    md/cluster: fix deadlock when node is doing resync job
> 
>   drivers/md/md-cluster.c | 69 +++++++++++++++++++++++------------------
>   drivers/md/md.c         | 14 ++++++---
>   2 files changed, 49 insertions(+), 34 deletions(-)
>