Hello, On 11/12/20 1:08 PM, Xiao Ni wrote: > > > On 11/11/2020 11:51 PM, Zhao Heming wrote: >> There is a similar deadlock in commit 0ba959774e93 >> ("md-cluster: use sync way to handle METADATA_UPDATED msg") >> My patch fixed issue is very like commit 0ba959774e93, except <c>. >> 0ba959774e93 step <c> is "update sb", my fix is "mdadm --remove" >> >> ... ... >> + !test_and_set_bit(MD_CLUSTER_SEND_LOCK, &cinfo->state), >> + msecs_to_jiffies(5000)); >> + if (rv) >> + return lock_token(cinfo, mddev_locked); >> + else >> + return -1; >> } > Hi Heming > > Can we handle this problem like metadata_update_start? lock_comm and metadata_update_start are two users that > want to get token lock. lock_comm can do the same thing as metadata_update_start does. If so, process_recvd_msg > can call md_reload_sb without waiting. All threads can work well when the initiated node release token lock. Resync > can send message and clear MD_CLUSTER_SEND_LOCK, then lock_comm can go on working. In this way, all threads > work successfully without failure. > Currently MD_CLUSTER_SEND_LOCKED_ALREADY only for adding a new disk. (please refer Documentation/driver-api/md/md-cluster.rst section: 5. Adding a new Device") During ADD_NEW_DISK process, md-cluster will block all other msg sending until metadata_update_finish calls unlock_comm. With your idea, md-cluster will allow to concurrently send msg. I'm not very familiar with all raid1 scenarios. But at least, you break the rule of handling ADD_NEW_DISK. it will allow send other msg during ADD_NEW_DISK. >> static void unlock_comm(struct md_cluster_info *cinfo) >> @@ -784,9 +789,11 @@ static int sendmsg(struct md_cluster_info *cinfo, struct cluster_msg *cmsg, >> { >> int ret; >> - lock_comm(cinfo, mddev_locked); >> ... ... >> + if (mddev_is_clustered(mddev)) { >> + if (md_cluster_ops->remove_disk(mddev, rdev)) >> + goto busy; >> + } >> md_kick_rdev_from_array(rdev); >> set_bit(MD_SB_CHANGE_DEVS, &mddev->sb_flags); > These codes are not related with this deadlock problem. It's better to file a new patch > to fix checking the return value problem. > In my opinion: we should include these code in this patch. For totally fix the deadlock, md layer should return error to userspace. But with your comments, I found other potential issues of md-cluster. There still have some cluster_ops APIs, which caller doesn't care error return: .resync_finish - may cause other nodes never clean MD_RESYNCING_REMOTE. .resync_info_update - this error could be safely ignore .metadata_update_finish - may cause other nodes kernel md info is inconsistent with disk metadata. maybe I or other guys fix them in the future. > Best Regards > Xiao >