On 11/10/20 2:36 PM, Guoqing Jiang wrote: > > > On 11/8/20 15:53, Zhao Heming wrote: >> There is a similar deadlock in commit 0ba959774e93 >> ("md-cluster: use sync way to handle METADATA_UPDATED msg") >> ... ... >> static void unlock_comm(struct md_cluster_info *cinfo) >> @@ -784,9 +794,11 @@ static int sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg, >> { >> int ret; >> - lock_comm(cinfo, mddev_locked); >> - ret = __sendmsg(cinfo, cmsg); >> - unlock_comm(cinfo); >> + ret = lock_comm(cinfo, mddev_locked); >> + if (!ret) { >> + ret = __sendmsg(cinfo, cmsg); >> + unlock_comm(cinfo); >> + } >> return ret; >> } > > What happens if the cluster has latency issue? Let's say, node normally can't get lock during the 5s window. IOW, is the timeout value justified well? 5 seconds is a random number first from my brain when I was coding. below I gave more info for my patch. The key of patch 2 is to change from wait_event to wait_event_timeout. After applying patch, code logic of some functions are also changed. I will analyze the two sides: send & receive *** send side*** Before applying patch, sendmsg only fail when __sendmsg return error. it means dlm layer fails to convert lock. But currently code should do access the return value of lock_comm. After applying patch, there will have new fail cases: 5s timeout. all related functions: resync_bitmap - void, only print error info when error. update_bitmap_size - return err. caller already handles error case. resync_info_update - return err. caller ignore. because this is info msg, it can safely ignore. remove_disk - return err. part of caller handle error case. I forgot to modify hot_remove_disk(), will add it in v2 patch. So in future v2 patch, all callers will handle error case. gather_bitmaps - return err, caller already handles error case We can see there is only one function which doesn't care return value: resync_bitmap this function is used in leave path. if the msg doesn't send out (5s timeout), the result is other nodes won't know the failure event by BITMAP_NEEDS_SYNC. But if my understand is correct, even if missing BITMAP_NEEDS_SYNC, there is another api recover_slot(), which is triggered by dlm and do the same job. So all the sending side related functions are safe. *** receive side *** process_metadata_update the patch means it won't do metadata update job when 5s timeout. and I change my mind, 5s timeout is wrong. Receive side should do as more as possible to handle incomming msg. if there is a 5s timeout code branch, there will tigger inconsistent issue. e.g. A does --faulty, send METADATA_UPDATE to B, B receives the msg, but meets 5s timeout. it won't to update kernel md info. and it will have a gap between kernel md info and disk metadata.