Re: nfsd4: utime sometimes takes 40+ seconds to return (but on SLES11SP3 with kernel 3.0.82)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 10, 2013 at 10:08:36PM +0000, Joschi Brauchle wrote:
> Am 10.09.2013 um 23:55 schrieb "J. Bruce Fields" <bfields@xxxxxxxxxxxx>:
> 
> > On Tue, Sep 10, 2013 at 09:48:12PM +0000, Joschi Brauchle wrote:
> >> Am 10.09.2013 um 22:35 schrieb "J. Bruce Fields" <bfields@xxxxxxxxxxxx>:
> >> 
> >>> On Tue, Sep 10, 2013 at 08:49:05PM +0200, Joschi Brauchle wrote:
> >>>> Hello everyone,
> >>>> 
> >>>> we are administrating an NFS high-availability cluster running on
> >>>> SLES11SP1 with kernel 2.6.32.59. Just recently, one of the cluster
> >>>> machines was updated to SLES11SP3 with kernel 3.0.82.
> >>>> 
> >>>> 
> >>>> We are now experiencing severe hangs on NFS clients when the
> >>>> SLES11SP3 server is running the NFS services. An strace on the
> >>>> hanging processes on the client side show that is is waiting up to
> >>>> 60+ seconds for a "utime()" call to complete.
> >>>> 
> >>>> 
> >>>> The problem we see is matching the problem described in the thread
> >>>> "v3.5 nfsd4 regression; utime sometimes takes 40+ seconds to
> >>>> return". If the NFS server is running on SLES11SP3, the little test
> >>>> program provided in this tread hangs at the "utime()" call for 60+
> >>>> seconds. It hangs each time it is run! It finishes right away with 0
> >>>> seconds delay is SLES11SP1 is providing NFS services, each time.
> >>>> 
> >>>> 
> >>>> Now, in the serverside logfiles of SLES11SP3 we see these messages
> >>>> (not so on SP1):
> >>>> --------------
> >>>> kernel: [99381.184976] RPC: AUTH_GSS upcall timed out.
> >>>> kernel: [99381.184978] Please check user daemon is running.
> >>>> --------------
> >>>> 
> >>>> We have always been running the NFS server without rpc.gssd on the
> >>>> server side, as the init script for the nfsserver also does not
> >>>> start rpc.gssd.
> >>>> 
> >>>> 
> >>>> Once we started rpc.gssd on the SLES11SP3 server, using the test
> >>>> utility on the client shows that the first call to "utime()"
> >>>> succeeds right away, the second call takes ~25s to complete. But
> >>>> now, any consecutive runs of the utility finish with no more delay.
> >>>> 
> >>>> 
> >>>> So can anyone confirm that with kernel 3.0+ the rpc.gssd daemon is
> >>>> also required on the server side for correct operation?
> >>>> 
> >>>> Has there been a change between kernel 2.6.32.59 and 3.0.x?
> >>>> 
> >>>> Thus, is the init script of the nfsserver in SLES11SP3 indeed
> >>>> missing to start rpc.gssd?
> >>> 
> >>> It should be starting rpc.gssd to allow callbacks, yes.
> >>> 
> >>> --b.
> >> 
> >> Ok, we will run rpc.gssd on the server. Thanks. 
> >> 
> >> Could you please comment on having the nfs clients hang on utime() calls is to be expected when *not* running rpc.gssd? Or is this a problem that needs to be investigated?
> > 
> > I think what happens is the utime call breaks a delegation, and the
> > delay is because the lack of gssd prevents the server from calling back
> > to the client to tell it that its delegation is broken, so the
> > delegation has to time out.
> > 
> > That said, the server does a null callback to the client to test whether
> > callbacks are working before it gives out any delegations, so I'm
> > surprised it wouldn't have noticed the broken callbacks then.
> > 
> > --b.
> 
> Is there any information I can provide to figure this out? At what time is the null callback sent to the client? Maybe I can tcpdump that sequence...

It happens when the client sends a SETCLIENTID, which I think will be
the first time it opens a file.  Running "rpcdebug -m nfsd -s proc" on
the server and then looking in the logs afterwards might also be
enlightening.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux