On 11/24/23 9:29 AM, Song Liu wrote:
On Wed, Nov 8, 2023 at 10:22 AM Junxiao Bi <junxiao.bi@xxxxxxxxxx> wrote:
This reverts commit 5e2cf333b7bd5d3e62595a44d598a254c697cd74.
That commit introduced the following race and can cause system hung.
md_write_start: raid5d:
// mddev->in_sync == 1
set "MD_SB_CHANGE_PENDING"
// running before md_write_start wakeup it
waiting "MD_SB_CHANGE_PENDING" cleared
>>>>>>>>> hung
wakeup mddev->thread
...
waiting "MD_SB_CHANGE_PENDING" cleared
>>>> hung, raid5d should clear this flag
but get hung by same flag.
The issue reverted commit fixing is fixed by last patch in a new way.
Fixes: 5e2cf333b7bd ("md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d")
Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx>
The set looks good to me. Thanks!
Thanks for the review.
Quick question: from the earlier thread, the issue was observed in
production. Have you reproduced the issue and thus verified the fix
works as expected?
I didn't try reproducing this since the system hung on the code where
the bad commit added, after revert it, this issue will not reproduce any
more.
Thanks,
Junxiao.
Thanks,
Song