Re: 2.6.38.6 - state manager constantly respawns

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 16, 2011 at 04:20:59PM -0400, Dr. J. Bruce Fields wrote:
> On Mon, May 16, 2011 at 03:54:16PM -0400, Trond Myklebust wrote:
> > On Mon, 2011-05-16 at 12:48 -0700, Harry Edmon wrote:
> > > On 05/16/11 12:43, Trond Myklebust wrote:
> > > > On Mon, 2011-05-16 at 12:36 -0700, Harry Edmon wrote:
> > > >    
> > > >> On 05/16/11 12:22, Chuck Lever wrote:
> > > >>      
> > > >>> On May 16, 2011, at 3:12 PM, Harry Edmon wrote:
> > > >>>
> > > >>>
> > > >>>        
> > > >>>> Attached is 1000 lines of output from tshark when the problem is occurring.   The client and server are connected by a private ethernet.
> > > >>>>
> > > >>>>          
> > > >>> Disappointing: tshark is not telling us the return codes.  However, I see "PUTFH;READ" then "RENEW" in a loop, which indicates the state manager thread is being kicked off because of ongoing difficulties with state recovery.  Is there a stuck application on that client?
> > > >>>
> > > >>> Try again with "tshark -V".
> > > >>>
> > > >>>        
> > > >> Here is the output from tshark -V (first 50,000 lines).   Nothing
> > > >> appears to be stuck, and as I said when I reboot the client into 2.6.32
> > > >> the problem goes away, only to reappear when I reboot it back into 2.6.38.6.
> > > >>
> > > >>      
> > > > Possibly, but it definitely indicates a server bug. What kind of server
> > > > are you using?
> > > >
> > > > Basically, the client is getting confused because when it sends a READ,
> > > > the server is telling it that the lease has expired, then when it sends
> > > > a RENEW, the same server replies that the lease is OK...
> > > >
> > > > Trond
> > > >    
> > > The server is running the 2.6.38.6 kernel with Debian squeeze, just like 
> > > the client.   The kernel config is attached.
> > 
> > Bruce, any idea how the server might get into this state?
> 
> So READ is getting ESTALE

Err, sorry, EXPIRED.

> and RENEW is getting OK?  And we're positive
> that the stateid on the READ is derived from the clientid sent with the
> RENEW?
> 
> OK, I'll look at the capture....

Hm, so the renews all have clid 465ccc4d09000000, and the reads all have
a stateid (0, 465ccc4dc24c0a0000000000).

So the first 4 bytes matching just tells me both were handed out by the
same server instance (so there was no server reboot in between); there's
no way for me to tell whether they really belong to the same client.

The server does assume that any stateid from the current server instance
that no longer exists in its table is expired.  I believe that's
correct, given a correctly functioning client, but perhaps I'm missing a
case.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux