Re: Potential race in dlm based messaging md-cluster.c

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>>> On 5/7/2015 at 05:14 PM, in message <554B2CED.5050903@xxxxxxxxx>, Abhijit
Bhopatkar <abhopatk@xxxxxxxxx> wrote: 
> On 07/05/15 8:13 am, Lidong Zhong wrote: 
>>>>> On 5/5/2015 at 08:10 PM, in message <5548B32B.5070904@xxxxxxxxx>, Abhijit 
> > Bhopatkar <abhopatk@xxxxxxxxx> wrote: 
> >> On 05/05/15 3:14 pm, Abhijit Bhopatkar wrote: 
> >>> On 05/05/15 2:52 pm, Lidong Zhong wrote: 
> >>>>>>> On 5/1/2015 at 02:36 AM, in message <5542763C.90202@xxxxxxxxx>, Abhijit 
> >>>> Bhopatkar <abhopatk@xxxxxxxxx> wrote: 
> >> 
> >> <snip> 
> >> 
> >>>>> 
> >>>>> To illustrate the problem consider timeline for two senders and one 
> >>>>> receiver (we will ignore receive part for Sender2 node) 
> >>>>> 
> >>>>> Sender1              Sender2                         Receiver 
> >>>>> Get EX on TOKEN       Get EX on TOKEN 
> >>>>> <Granted>                    <Wait till granted> 
> >>>>> 
> >>>>> Get EX on MSG 
> >>>>> write LVB 
> >>>>> down MSG to CR 
> >>>>> Get EX of ACK 
> >>>>> <wait till granted> 
> >>>>>         BAST for ACK 
> >>>>>                                                                Get CR on  
> MSG 
> >>>>>                        read LVB 
> >>>>>                        process 
> >>>>>                        release ACK 
> >>>>> AST for ACK 
> >>>>> down ACK to CR 
> >>>>> release MSG 
> >>>>> release TOKEN 
> >>>>>                       <granted> 
> >>>>>                       Get EX on MSG 
> >>>> 
> >>>> I am afraid this corner case could not be achieved ever. Sender2 will be 
> >> blocked on getting 
> >>>> EX lock on MSG resource until the receivers release the lock. The 
> >> receivers' request on 
> >>>> upconverting CR to EX on MSG should be put into the convert queue before 
> >> Sender2's 
> >>>> request being put into the wait queue, because sender2 has to wait until 
> >> the EX on TOKEN 
> >>>> is released. 
> >>>> 
> >>> Yes my initial though of losing a message is not correct. The EX on message 
> >> won't be granted 
> >>> immediately to Sender2 However there is still a deadlock. 
> >>> 
> >>> Perhaps i am missing something, but according to me nothing prevents 
> >> Sender2 from acquiring 
> >>> EX on TOKEN _and_ MESSAGE __before__ up convert from reciever is queued. 
> >> Consider adding 
> >>> unusual delay right after ACK is released on receiver. The Sender1 will 
> >> immediately release 
> >>> MESSAGE and TOKEN. The receiver is still delayed for whatever reason. 
> >> Sender2 gets TOKEN grant 
> >>> and immediately queues EX for MESSAGE (note this is before EX for MESSAGE 
> >> is queued by receiver). 
> >>> 
> > 
> > Yes, there is a possibility leading to deadlock here. 
> >>> DLM will (should?) return error for the up convert saying there is deadlock 
> >> (-EDEADLK ??) 
> >>> 
> >> 
> >> On further investigation in dlm code. Since we do not set  
> DLM_LKF_CONVDEADLK 
> >> flag on our locks, 
> >> in above deadlock case receiver's request to up convert will be simply 
> >> canceled. And the code 
> >> will proceed as expected since receiver still holds CR on MESSAGE. And then 
> >> after the processing 
> >> we will release the CR. 
> >> 
> >> So now my question is changed to; 
> >> 
> >> Why do we up convert the MESSAGE to EX in the first place? 
> >> 
> >> Was receiver EX on MESSAGE intended to serialize all receivers before  
> taking 
> >> CR on ACK? 
> >> 
> > 
> > Yes, it is. Otherwise, each receiver may get duplicate messages when they  
> try to 
> > get CR on ACK while the sender doesn't downconvert EX on ACK in time. 
>  
> If I am reading this right, are we afraid of getting second BAST call on  
> receiver? 
> Sender is holding EX on ACK, receiver releases CR of ACK after processing  
> the message. 
> But sender is delayed in releasing EX on ACK. Receiver re-queues CR on ACK,  
> which 
> might trigger BAST? (Note receiver won't get CR grant until sender released  
> EX). 
>  

Yes and I think the reason you have explained well enough. Actually at first we 
did as you said, but we found that sometimes the receiver might get duplicate 
messages. Then we made a change here.

> A new CR by receiver on ACK will _not_ trigger BAST call. Instead no AST  
> will be called 
> until the original EX on ACK by sender is not released. BAST is called only  
> on locks 

The description here seems right to me, but I don't get the connection, sorry

> that are already granted. Since we trigger message processing only on BAST I  
> don't 
> see a possibility of duplicate message here. 
>  
> > 
> > What I can think of a way to fix the deadlock now is setting the  
> DLM_LKF_NOQUEUE 
> > flag when the sender tries to get EX on MESSAGE. It should keep trying  
> until all the 
> > receivers release their locks on MESSAGE. Do you have any better idea  
> without adding 
> > more lock resources? Since we already have three for transmitting messages. 
> > 
> Its exactly what I was thinking about and sounds like a good solution.  
> However 
> as said  above I don't think receiver EX on ACK is really needed. 
>  

As already explained. If there's no other problem to you, then here needs
a patch to fix the potential deadlock.

Regards,
Lidong
> Regards, 
> Abhijit 
>  
> > Regards, 
> > Lidong 
> > 
> > 
> >> Since there is a possibility that we might lose out on this up convert in a 
> >> race  condition, can 
> >> we simply eliminate this up conversion? (since CR is preventing the next 
> >> Sender from taking 
> >> EX on MESSAGE anyway). 
> >> 
> >> Regards, 
> >> Abhijit 
> >> 
> >> -- 
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in 
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx 
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html 
> >> 
> >> 
> > 
>  
>  
>  

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux