Guoqing Jiang (2021-02-13 02:49 Europe/Helsinki): > Unregister sync_thread doesn't need to hold reconfig_mutex since it > doesn't reconfigure array. > > And it could cause deadlock problem for raid5 as follows: > > 1. process A tried to reap sync thread with reconfig_mutex held after echo > idle to sync_action. > 2. raid5 sync thread was blocked if there were too many active stripes. > 3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer) > which causes the number of active stripes can't be decreased. > 4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able > to hold reconfig_mutex. > > More details in the link: > https://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@xxxxxxxxxxxxx/T/#t I don't understand md well enough to suggest a patch but isn't this logically a classic two lock deadlock problem where thread 1: - lock reconfig_mutex - blocked for sync that requires SB_CHANGE_PENDING thread 2: - (logically) acquire lock SB_CHANGE_PENDING - blocked for reconfig_mutex before SB_CHANGE_PENDING can be released ? The classic fix for this kind of deadlock is to require these locks to be always acquired in constant order and released in reverse order. If you had a rule that SB_CHANGE_PENDING cannot be set or cleared without already having reconfig_mutex, wouldn't that prevent this deadlock? (If I understood the issue correctly, it's currently possible to set but not clear the SB_CHANGE_PENDING without having reconfig_mutex.) Another possibility is to expect SB_CHANGE_PENDING to be set as part of sync process required change to "idle" (write to sync_action). In that case the logic would be you cannot have reconfig_mutex already locked before setting (logically acquiring lock) SB_CHANGE_PENDING. So the transfer from active to idle would require first setting SB_CHANGE_PENDING, doing the required processing (getting and freeing reconfig_mutex in process) and then clearing SB_CHANGE_PENDING. Basically the rule would be you must lock SB_CHANGE_PENDING before you can lock reconfig_mutex. If I've understood correctly SB_CHANGE_PENDING is not technically a lock but it's logically used like it were a lock. Would either of these make sense for the overall design? Obviously, if it doesn't hurt the performance too much, using a single lock for everything that needs to be serialized would be much easier. -- Mikko