Re: NFS4ERR_STALE_CLIENTID loop

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 31 Oct 2011 09:07:48 -0400

On Oct 29, 2011, at 3:52 PM, David Flynn wrote:

> * Myklebust, Trond (Trond.Myklebust@xxxxxxxxxx) wrote:
>>> -----Original Message-----
>>> From: Chuck Lever [mailto:chuck.lever@xxxxxxxxxx]
>>> On Oct 29, 2011, at 2:47 PM, J. Bruce Fields wrote:
>>>> Yes, and it's not something I care that strongly about, really, my
>>>> only observation is that this sort of failure (an implementation
>>>> bug on one side or another resulting in a loop) seems to have been
>>>> common (based on no hard data, just my vague memories of list
>>>> threads), and the results fairly obnoxious (possibly even for
>>>> unrelated hosts on the network).
>>>> So if there's some simple way to fail more gracefully it might be
>>>> helpful.
>>> 
>>> For what it's worth, I agree that client implementations should
>>> attempt to behave more gracefully in the face of server problems, be
>>> they the result of bugs or the result of other issues specific to
>>> that server.  Problems like this make NFSv4 as a protocol look bad.
>> 
>> I can't see what a client can do in this situation except possibly just
>> give up after a while and throw a SERVER_BROKEN error (which means data
>> loss). That still won't make NFSv4 look good...
> 
> Indeed, it is a quite the dilemma.
> 
> I agree that giving and guaranteeing unattended data loss is bad (data
> loss at the behest of an operator is ok, afterall they can always fence
> a broken machine).

David, what would help immensely is if you can find a reliable way of reproducing this.  So far we have been unable to find a reproducer.

> Looking at some of the logs again, even going back to the very original
> case, it appears to be about 600us between retries (RTT=400us).  Is
> there any way to make that less aggressive?, eg 1s? -- that'd reduce the
> impact by three orders of magnitude.  What would be the down-side?  How
> often do you expect to get a BAD_STATEID error?
> 
> ..david

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html