Re: [PATCH 2/2] nfsd: return ENOSPC if unable to allocate a session slot

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Sun, 24 Jun 2018 13:56:46 +0000

On Sat, 2018-06-23 at 15:00 -0400, Chuck Lever wrote:
> > On Jun 22, 2018, at 6:31 PM, Trond Myklebust <trondmy@hammerspace.c
> > om> wrote:
> > 
> > On Fri, 2018-06-22 at 17:49 -0400, Chuck Lever wrote:
> > > Hi Bruce-
> > > 
> > > 
> > > > On Jun 22, 2018, at 1:54 PM, J. Bruce Fields <bfields@fieldses.
> > > > org>
> > > > wrote:
> > > > 
> > > > On Thu, Jun 21, 2018 at 04:35:33PM +0000, Manjunath Patil
> > > > wrote:
> > > > > Presently nfserr_jukebox is being returned by nfsd for
> > > > > create_session
> > > > > request if server is unable to allocate a session slot. This
> > > > > may
> > > > > be
> > > > > treated as NFS4ERR_DELAY by the clients and which may
> > > > > continue to
> > > > > re-try
> > > > > create_session in loop leading NFSv4.1+ mounts in hung state.
> > > > > nfsd
> > > > > should return nfserr_nospc in this case as per
> > > > > rfc5661(section-
> > > > > 18.36.4
> > > > > subpoint 4. Session creation).
> > > > 
> > > > I don't think the spec actually gives us an error that we can
> > > > use
> > > > to say
> > > > a CREATE_SESSION failed permanently for lack of resources.
> > > 
> > > The current situation is that the server replies NFS4ERR_DELAY,
> > > and the client retries indefinitely. The goal is to let the
> > > client choose whether it wants to try the CREATE_SESSION again,
> > > try a different NFS version, or fail the mount request.
> > > 
> > > Bill and I both looked at this section of RFC 5661. It seems to
> > > us that the use of NFS4ERR_NOSPC is appropriate and unambiguous
> > > in this situation, and it is an allowed status for the
> > > CREATE_SESSION operation. NFS4ERR_DELAY OTOH is not helpful.
> > 
> > There are a range of errors which we may need to handle by
> > destroying
> > the session, and then creating a new one (mainly the ones where the
> > client and server slot handling get out of sync). That's why
> > returning
> > NFS4ERR_NOSPC in response to CREATE_SESSION is unhelpful, and is
> > why
> > the only sane response by the client will be to treat it as a
> > temporary
> > error.
> > IOW: these patches will not be acceptable, even with a rewrite, as
> > they
> > are based on a flawed assumption.
> 
> Fair enough. We're not attached to any particular solution/fix.
> 
> So let's take "recovery of an active mount" out of the picture
> for a moment.
> 
> The narrow problem is behavioral: during initial contact with an
> unfamiliar server, the server can hold off a client indefinitely
> by sending NFS4ERR_DELAY for example until another client unmounts.
> We want to find a way to allow clients to make progress when a
> server is short of resources.
> 
> It appears that the mount(2) system call does not return as long
> as the server is still returning NFS4ERR_DELAY. Possibly user
> space is never given an opportunity to stop retrying, and thus
> mount.nfs gets stuck.
> 
> It appears that DELAY is OK for EXCHANGE_ID too. So if a server
> decides to return DELAY to EXCHANGE_ID, I wonder if our client's
> trunking detection would be hamstrung by one bad server...

The 'mount' program has the 'retry' option in order to set a timeout
for the mount operation itself. Is that option not working correctly?
If so, we should definitely fix that.
We might also want to look into making it take values < 1 minute. That
could be accomplished either by extending the syntax of the 'retry'
option (e.g.: 'retry=<minutes>:<seconds>') or by adding a new option
(e.g. 'sretry=<seconds>').

It would then be up to the caller of mount to decide the policy of what
to do after a timeout. Renegotiation downward to NFSv3 might be an
option, but it's not something that most people want to do in the case
where there are lots of clients competing for resources since that's
precisely the regime where the NFSv3 DRC scheme breaks down (lots of
disconnections, combined with a high turnover of DRC slots).

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx

��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥