On 2/11/21 08:28, Song Liu wrote:
On Tue, Feb 9, 2021 at 6:22 PM Guoqing Jiang
<guoqing.jiang@xxxxxxxxxxxxxxx> wrote:
Unregister sync_thread doesn't need to hold reconfig_mutex since it
doesn't reconfigure array.
And it could cause deadlock problem for raid5 as follows:
1. process A tried to reap sync thread with reconfig_mutex held after echo
idle to sync_action.
2. raid5 sync thread was blocked if there were too many active stripes.
3. SB_CHANGE_PENDING was set (because of write IO comes from upper layer)
which causes the number of active stripes can't be decreased.
4. SB_CHANGE_PENDING can't be cleared since md_check_recovery was not able
to hold reconfig_mutex.
More details in the link:
issu://lore.kernel.org/linux-raid/5ed54ffc-ce82-bf66-4eff-390cb23bc1ac@xxxxxxxxxxxxx/T/#t
Reported-and-tested-by: Donald Buczek <buczek@xxxxxxxxxxxxx>
Signed-off-by: Guoqing Jiang <guoqing.jiang@xxxxxxxxxxxxxxx>
Thanks for debugging the issue. However, I am not sure whether this is
the proper
fix. For example, would this break dm-raid.c:raid_message()? IIUC,
raid_message()
calls md_reap_sync_thread() without holding reconfigure_mutex, no?
Oops, I didn't notice dm-raid calls it though md did call it with
reconfig_mutex held. But on the other side, it proves we don't need to
call md_reap_sync_thread with the mutex held.
Thanks,
Guoqing