> I've always thought that NLM was a less-than-perfect locking protocol, but I > recently discovered as aspect of it that is worse than I imagined. > > Suppose client-A holds a lock on some region of a file, and client-B makes a > non-blocking lock request for that region. > Now suppose as just before handling that request the lockd thread on the > server stalls - for example due to excessive memory pressure causing a > kmalloc to take 11 seconds (rare, but possible. Such allocations never fail, > they just block until they can be served). > > During this 11 seconds (say, at the 5 second mark), client-A releases the lock - > the UNLOCK request to the server queues up behind the non-blocking LOCK > from client-B > > The default retry time for NLM in Linux is 10 seconds (even for TCP!) so NLM > on client-B resends the non-blocking LOCK request, and it queues up behind > the UNLOCK request. > > Now finally the lockd thread gets some memory/CPU time and starts > handling requests: > LOCK from client-B - DENIED > UNLOCK from client-A - OK > LOCK from client-B - OK > > Both replies to client-B have the same XID so client-B will believe whichever > one it gets first - DENIED. > > So now we have the situation where client-B doesn't think it holds a lock, but > the server thinks it does. This is not good. > > I think this explains a locking problem that a customer is seeing. The > application seems to busy-wait for the lock using non-blocking LOCK > requests. Each LOCK request has a different 'svid' so I assume each comes > from a different process. If you busy-wait from the one process this problem > won't occur. > > Having a reply-cache on the server lockd might help, but such things easily fill > up and cannot provide a guarantee. > > Having a longer timeout on the client would probably help too. At the very > least we should increase the maximum timeout beyond 20 seconds. > (assuming I reading the code correctly, the client resend timeout is based on > nlmsvc_timeout which is set from nlm_timeout which is restricted to the > range 3-20). > > Forcing the xid to change on every retransmit (for NLM) would ensure that > we only accept the last reply, which I think is safe. That sounds like a good solution to me. Since the requests are non-blocking, each request should be considered separate from the others. Frank --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html