On Wed, Mar 30 2016, Chuck Lever wrote: > Hi Neil- > > Ramblings inline. > > >> On Mar 27, 2016, at 7:40 PM, NeilBrown <neilb@xxxxxxxx> wrote: >> >> >> I've always thought that NLM was a less-than-perfect locking protocol, >> but I recently discovered as aspect of it that is worse than I imagined. >> >> Suppose client-A holds a lock on some region of a file, and client-B >> makes a non-blocking lock request for that region. >> Now suppose as just before handling that request the lockd thread >> on the server stalls - for example due to excessive memory pressure >> causing a kmalloc to take 11 seconds (rare, but possible. Such >> allocations never fail, they just block until they can be served). >> >> During this 11 seconds (say, at the 5 second mark), client-A releases >> the lock - the UNLOCK request to the server queues up behind the >> non-blocking LOCK from client-B >> >> The default retry time for NLM in Linux is 10 seconds (even for TCP!) so >> NLM on client-B resends the non-blocking LOCK request, and it queues up >> behind the UNLOCK request. >> >> Now finally the lockd thread gets some memory/CPU time and starts >> handling requests: >> LOCK from client-B - DENIED >> UNLOCK from client-A - OK >> LOCK from client-B - OK >> >> Both replies to client-B have the same XID so client-B will believe >> whichever one it gets first - DENIED. >> >> So now we have the situation where client-B doesn't think it holds a >> lock, but the server thinks it does. This is not good. >> >> I think this explains a locking problem that a customer is seeing. The >> application seems to busy-wait for the lock using non-blocking LOCK >> requests. Each LOCK request has a different 'svid' so I assume each >> comes from a different process. If you busy-wait from the one process >> this problem won't occur. >> >> Having a reply-cache on the server lockd might help, but such things >> easily fill up and cannot provide a guarantee. > > What would happen if the client serialized non-blocking > lock operations for each inode? Or, if a non-blocking > lock request is outstanding on an inode when another > such request is made, can EAGAIN be returned to the > application? I cannot quite see how this is relevant. I imagine one app on one client is using non-blocking requests to try to get a lock, and a different app on a different client holds, and then drops, the lock. I don't see how serialization on any one client will change that. > > >> Having a longer timeout on the client would probably help too. At the >> very least we should increase the maximum timeout beyond 20 seconds. >> (assuming I reading the code correctly, the client resend timeout is >> based on nlmsvc_timeout which is set from nlm_timeout which is >> restricted to the range 3-20). > > A longer timeout means the client is slower to respond to > slow or lost replies (ie, adjusting the timeout is not > consequence free). True. But for NFS/TCP the default timeout is 60 seconds. For NLM/TCP the default is 10 seconds and a hard upper limit is 20 seconds. This, at least, can be changed without fearing consequences. > > Making the RTT slightly longer than this particular server > needs to recharge its batteries seems like a very local > tuning adjustment. This is exactly what I've ask out partner to experiment with. No results yet. > > >> Forcing the xid to change on every retransmit (for NLM) would ensure >> that we only accept the last reply, which I think is safe. > > To make this work, then, you'd make client-side NLM > RPCs soft, and the upper layer (NLM) would handle > the retries. When a soft RPC times out, that would > "cancel" that XID and the client would ignore > subsequent replies for it. Soft, with zero retransmits I assume. The NLM client already assumes "hard" (it doesn't pay attention to the "soft" NFS option). Moving that indefinite retry from sunrpc to lockd would probably be easy enough. > > The problem is what happens when the server has > received and processed the original RPC, but the > reply itself is lost (say, because the TCP > connection closed due to a network partition). > > Seems like there is similar capacity for the client > and server to disagree about the state of the lock. I think that as long as the client sees the reply to the *last* request, they will end up agreeing. So if requests can be re-order you could have problems, but tcp protects us again that. I'll have a look at what it would take to get NLM to re-issue requests. Thanks, NeilBrown > > > -- > Chuck Lever > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature