On 11/13/20 10:50 AM, Xiao Ni wrote: > > > ----- Original Message ----- >> From: "heming zhao" <heming.zhao@xxxxxxxx> >> To: "Xiao Ni" <xni@xxxxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, song@xxxxxxxxxx, "guoqing jiang" >> <guoqing.jiang@xxxxxxxxxxxxxxx> >> Cc: "lidong zhong" <lidong.zhong@xxxxxxxx>, neilb@xxxxxxx, colyli@xxxxxxx >> Sent: Thursday, November 12, 2020 7:27:46 PM >> Subject: Re: [PATCH v2] md/cluster: fix deadlock when doing reshape job >> >> Hello, >> >> On 11/12/20 1:08 PM, Xiao Ni wrote: >>> >>> >>> On 11/11/2020 11:51 PM, Zhao Heming wrote: >>>> There is a similar deadlock in commit 0ba959774e93 >>>> ("md-cluster: use sync way to handle METADATA_UPDATED msg") >>>> My patch fixed issue is very like commit 0ba959774e93, except <c>. >>>> 0ba959774e93 step <c> is "update sb", my fix is "mdadm --remove" >>>> >>>> ... ... >>>> + !test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state), >>>> + msecs_to_jiffies(5000)); >>>> + if (rv) >>>> + return lock_token(cinfo, mddev_locked); >>>> + else >>>> + return -1; >>>> } >>> Hi Heming >>> >>> Can we handle this problem like metadata_update_start? lock_comm and >>> metadata_update_start are two users that >>> want to get token lock. lock_comm can do the same thing as >>> metadata_update_start does. If so, process_recvd_msg >>> can call md_reload_sb without waiting. All threads can work well when the >>> initiated node release token lock. Resync >>> can send message and clear MD_CLUSTER_SEND_LOCK, then lock_comm can go on >>> working. In this way, all threads >>> work successfully without failure. >>> >> >> Currently MD_CLUSTER_SEND_LOCKED_ALREADY only for adding a new disk. >> (please refer Documentation/driver-api/md/md-cluster.rst section: 5. Adding a >> new Device") >> During ADD_NEW_DISK process, md-cluster will block all other msg sending >> until metadata_update_finish calls >> unlock_comm. >> >> With your idea, md-cluster will allow to concurrently send msg. I'm not very >> familiar with all raid1 scenarios. >> But at least, you break the rule of handling ADD_NEW_DISK. it will allow send >> other msg during ADD_NEW_DISK. > > Hi Heming > > It doesn't need to change MD_CLUSTER_SEND_LOCKED_ALREADY. Is it ok to do something like this: > > diff --git a/drivers/md/md-cluster.c b/drivers/md/md-cluster.c > index 4aaf482..f6f576b 100644 > --- a/drivers/md/md-cluster.c > +++ b/drivers/md/md-cluster.c > @@ -664,29 +664,12 @@ static void recv_daemon(struct md_thread *thread) > * Takes the lock on the TOKEN lock resource so no other > * node can communicate while the operation is underway. > */ > -static int lock_token(struct md_cluster_info *cinfo, bool mddev_locked) > +static int lock_token(struct md_cluster_info *cinfo) > { > - int error, set_bit = 0; > + int error; > struct mddev *mddev = cinfo->mddev; > > - /* > - * If resync thread run after raid1d thread, then process_metadata_update > - * could not continue if raid1d held reconfig_mutex (and raid1d is blocked > - * since another node already got EX on Token and waitting the EX of Ack), > - * so let resync wake up thread in case flag is set. > - */ > - if (mddev_locked && !test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, > - &cinfo->state)) { > - error = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, > - &cinfo->state); > - WARN_ON_ONCE(error); > - md_wakeup_thread(mddev->thread); > - set_bit = 1; > - } > error = dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX); > - if (set_bit) > - clear_bit_unlock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state); > - > if (error) > pr_err("md-cluster(%s:%d): failed to get EX on TOKEN (%d)\n", > __func__, __LINE__, error); > @@ -701,10 +684,30 @@ static int lock_token(struct md_cluster_info *cinfo, bool mddev_locked) > */ > static int lock_comm(struct md_cluster_info *cinfo, bool mddev_locked) > { > + int ret, set_bit = 0; > + > + /* > + * If resync thread run after raid1d thread, then process_metadata_update > + * could not continue if raid1d held reconfig_mutex (and raid1d is blocked > + * since another node already got EX on Token and waitting the EX of Ack), > + * so let resync wake up thread in case flag is set. > + */ > + if (mddev_locked && !test_bit(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, > + &cinfo->state)) { > + error = test_and_set_bit_lock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, > + &cinfo->state); > + WARN_ON_ONCE(error); > + md_wakeup_thread(mddev->thread); > + set_bit = 1; > + } > + > wait_event(cinfo->wait, > !test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state)); > > - return lock_token(cinfo, mddev_locked); > + ret = lock_token(cinfo, mddev_locked); > + if (ret && set_bit) > + clear_bit_unlock(MD_CLUSTER_HOLDING_MUTEX_FOR_RECVD, &cinfo->state); > + return ret; > } > > static void unlock_comm(struct md_cluster_info *cinfo) > thank you for your comments & idea. I totally understand your solution, it's better than mine. I will use this idea to file v3 patch.