On 11/10/20 2:06 AM, Song Liu wrote: > On Sun, Nov 8, 2020 at 6:02 PM heming.zhao@xxxxxxxx > <heming.zhao@xxxxxxxx> wrote: >> >> Please note, I gave two solutions for this bug in cover-letter. >> This patch uses solution 2. For detail, please check cover-letter. >> >> Thank you. >> > > [...] > >>> >>> How to fix: >>> >>> There are two sides to fix (or break the dead loop): >>> 1. on sending msg side, modify lock_comm, change it to return >>> success/failed. >>> This will make mdadm cmd return error when lock_comm is timeout. >>> 2. on receiving msg side, process_metadata_update need to add error >>> handling. >>> currently, other msg types won't trigger error or error doesn't need >>> to return sender. So only process_metadata_update need to modify. >>> >>> Ether of 1 & 2 can fix the hunging issue, but I prefer fix on both side. >>> > > Similar comments on how to make the commit log easy to understand. > Besides that, please split the change into two commits, for fix #1 and #2 > respectively. > My comment meaning is that solution 2 also has two sub-solutions: sending side or receiving side. (but in fact, there are 3 sub-solutions: sending, receiving & both sides) sending side, related with patch 2 functions: sendmsg & lock_comm (code flow: sendmsg => lock_comm) receiving side, related with patch 2 functions: process_recvd_msg & process_metadata_update (code flow: process_recvd_msg => process_metadata_update) To break any side waiting can break deadlock. In the patch 2, my fix is both sides.