Re: NFS4ERR_STALE_CLIENTID loop

Chuck Lever <chuck.lever@xxxxxxxxxx> · Mon, 31 Oct 2011 09:39:42 -0400

On Oct 31, 2011, at 9:21 AM, David Flynn wrote:

> * Chuck Lever (chuck.lever@xxxxxxxxxx) wrote:
>> David, what would help immensely is if you can find a reliable way of
>> reproducing this.  So far we have been unable to find a reproducer.
> 
> While i've managed to have problems with individual machines, that seem
> to fail at some random point of their own choosing, the most reliable
> way to produce problem for us to have a number of nodes updating various
> RRD files frequently.
> 
> Given that i haven't found a reliable and short method for reproducing
> it either, if we were to re-run the known case and capture all network
> traffic, would you be able to extract the relevant detail to generate a
> simulation?

A reproducer would be better for us [*], but I understand the arbitrary nature of the problem.  A network trace would be an excellent start.

Now, it would be interesting if in fact the problem occurs only when multiple clients interact with a server.  In that case, capture a full network trace with snoop on your server.  We'll worry about pruning the size of the trace once you have a clean capture.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

* - A reproducer allows us to perform internal-only tests at will, and it also can confirm we've got the problem properly fixed.

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html