On 3/28/2016 12:04 PM, Frank Filz wrote:
I've always thought that NLM was a less-than-perfect locking protocol, but
I
recently discovered as aspect of it that is worse than I imagined.
Suppose client-A holds a lock on some region of a file, and client-B makes
a
non-blocking lock request for that region.
Now suppose as just before handling that request the lockd thread on the
server stalls - for example due to excessive memory pressure causing a
kmalloc to take 11 seconds (rare, but possible. Such allocations never
fail,
they just block until they can be served).
During this 11 seconds (say, at the 5 second mark), client-A releases the
lock -
the UNLOCK request to the server queues up behind the non-blocking LOCK
from client-B
The default retry time for NLM in Linux is 10 seconds (even for TCP!) so
NLM
on client-B resends the non-blocking LOCK request, and it queues up behind
the UNLOCK request.
Now finally the lockd thread gets some memory/CPU time and starts
handling requests:
LOCK from client-B - DENIED
UNLOCK from client-A - OK
LOCK from client-B - OK
Both replies to client-B have the same XID so client-B will believe
whichever
one it gets first - DENIED.
So now we have the situation where client-B doesn't think it holds a lock,
but
the server thinks it does. This is not good.
I think this explains a locking problem that a customer is seeing. The
application seems to busy-wait for the lock using non-blocking LOCK
requests. Each LOCK request has a different 'svid' so I assume each comes
from a different process. If you busy-wait from the one process this
problem
won't occur.
Having a reply-cache on the server lockd might help, but such things
easily fill
up and cannot provide a guarantee.
Having a longer timeout on the client would probably help too. At the
very
least we should increase the maximum timeout beyond 20 seconds.
(assuming I reading the code correctly, the client resend timeout is based
on
nlmsvc_timeout which is set from nlm_timeout which is restricted to the
range 3-20).
Forcing the xid to change on every retransmit (for NLM) would ensure that
we only accept the last reply, which I think is safe.
That sounds like a good solution to me. Since the requests are non-blocking,
each request should be considered separate from the others.
I totally disagree. To issue a new XID contradicts the entire notion of
"retransmit". It will badly break any hope of idempotency.
To me, there are two issues here:
1) The client should not be retransmitting on an unbroken connection.
2) The server should have a reply cache.
If both of those were true, this problem would not occur.
That said, if client B were to *drop the connection* and then *reissue*
the lock with a new XID, there would be a chance of things working
as desired.
But this would still leave many existing NLM issues on the table. It's
a pipe dream that NLM (and NSM) will truly support correct locking
semantics in the face of transient errors.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html