Re: [PATCH 2/2] md/cluster: fix deadlock when doing reshape job

Song Liu <song@xxxxxxxxxx> · Mon, 9 Nov 2020 10:06:56 -0800

On Sun, Nov 8, 2020 at 6:02 PM heming.zhao@xxxxxxxx
<heming.zhao@xxxxxxxx> wrote:
>
> Please note, I gave two solutions for this bug in cover-letter.
> This patch uses solution 2. For detail, please check cover-letter.
>
> Thank you.
>

[...]

> >
> > How to fix:
> >
> > There are two sides to fix (or break the dead loop):
> > 1. on sending msg side, modify lock_comm, change it to return
> >     success/failed.
> >     This will make mdadm cmd return error when lock_comm is timeout.
> > 2. on receiving msg side, process_metadata_update need to add error
> >     handling.
> >     currently, other msg types won't trigger error or error doesn't need
> >     to return sender. So only process_metadata_update need to modify.
> >
> > Ether of 1 & 2 can fix the hunging issue, but I prefer fix on both side.
> >

Similar comments on how to make the commit log easy to understand.
Besides that, please split the change into two commits, for fix #1 and #2
respectively.

Thanks,
Song