On 11/19/20 12:45 AM, Zhao Heming wrote: > md-cluster uses MD_CLUSTER_SEND_LOCK to make node can exclusively send msg. > During sending msg, node can concurrently receive msg from another node. > [... ...] > > Repro steps (I only triggered 3 times with hundreds tests): sorry to send v4 patch so late. I spent more than 2 days to run test script to trigger the deadlock. The result is I failed. So I wrote "I only triggered 3 times with hundreds tests" in commit log. > > two nodes share 3 iSCSI luns: sdg/sdh/sdi. Each lun size is 1GB. > ``` > ssh root@node2 "mdadm -S --scan" > [... ...] > > At last, thanks for Xiao's solution. > > Signed-off-by: Zhao Heming <heming.zhao@xxxxxxxx> > Suggested-by: Xiao Ni <xni@xxxxxxxxxx> > Reviewed-by: Xiao Ni <xni@xxxxxxxxxx> > --- > drivers/md/md-cluster.c | 69 +++++++++++++++++++++++------------------ > drivers/md/md.c | 6 ++-- > 2 files changed, 43 insertions(+), 32 deletions(-) >