Re: [PATCH 2/2] nfsd: return ENOSPC if unable to allocate a session slot

Chuck Lever <chucklever@xxxxxxxxx> · Mon, 25 Jun 2018 11:39:15 -0400

> On Jun 24, 2018, at 9:56 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote:
> 
> On Sat, 2018-06-23 at 15:00 -0400, Chuck Lever wrote:
>>> On Jun 22, 2018, at 6:31 PM, Trond Myklebust <trondmy@hammerspace.c
>>> om> wrote:
>>> 
>>> On Fri, 2018-06-22 at 17:49 -0400, Chuck Lever wrote:
>>>> Hi Bruce-
>>>> 
>>>> 
>>>>> On Jun 22, 2018, at 1:54 PM, J. Bruce Fields <bfields@fieldses.
>>>>> org>
>>>>> wrote:
>>>>> 
>>>>> On Thu, Jun 21, 2018 at 04:35:33PM +0000, Manjunath Patil
>>>>> wrote:
>>>>>> Presently nfserr_jukebox is being returned by nfsd for
>>>>>> create_session
>>>>>> request if server is unable to allocate a session slot. This
>>>>>> may
>>>>>> be
>>>>>> treated as NFS4ERR_DELAY by the clients and which may
>>>>>> continue to
>>>>>> re-try
>>>>>> create_session in loop leading NFSv4.1+ mounts in hung state.
>>>>>> nfsd
>>>>>> should return nfserr_nospc in this case as per
>>>>>> rfc5661(section-
>>>>>> 18.36.4
>>>>>> subpoint 4. Session creation).
>>>>> 
>>>>> I don't think the spec actually gives us an error that we can
>>>>> use
>>>>> to say
>>>>> a CREATE_SESSION failed permanently for lack of resources.
>>>> 
>>>> The current situation is that the server replies NFS4ERR_DELAY,
>>>> and the client retries indefinitely. The goal is to let the
>>>> client choose whether it wants to try the CREATE_SESSION again,
>>>> try a different NFS version, or fail the mount request.
>>>> 
>>>> Bill and I both looked at this section of RFC 5661. It seems to
>>>> us that the use of NFS4ERR_NOSPC is appropriate and unambiguous
>>>> in this situation, and it is an allowed status for the
>>>> CREATE_SESSION operation. NFS4ERR_DELAY OTOH is not helpful.
>>> 
>>> There are a range of errors which we may need to handle by
>>> destroying
>>> the session, and then creating a new one (mainly the ones where the
>>> client and server slot handling get out of sync). That's why
>>> returning
>>> NFS4ERR_NOSPC in response to CREATE_SESSION is unhelpful, and is
>>> why
>>> the only sane response by the client will be to treat it as a
>>> temporary
>>> error.
>>> IOW: these patches will not be acceptable, even with a rewrite, as
>>> they
>>> are based on a flawed assumption.
>> 
>> Fair enough. We're not attached to any particular solution/fix.
>> 
>> So let's take "recovery of an active mount" out of the picture
>> for a moment.
>> 
>> The narrow problem is behavioral: during initial contact with an
>> unfamiliar server, the server can hold off a client indefinitely
>> by sending NFS4ERR_DELAY for example until another client unmounts.
>> We want to find a way to allow clients to make progress when a
>> server is short of resources.
>> 
>> It appears that the mount(2) system call does not return as long
>> as the server is still returning NFS4ERR_DELAY. Possibly user
>> space is never given an opportunity to stop retrying, and thus
>> mount.nfs gets stuck.
>> 
>> It appears that DELAY is OK for EXCHANGE_ID too. So if a server
>> decides to return DELAY to EXCHANGE_ID, I wonder if our client's
>> trunking detection would be hamstrung by one bad server...
> 
> The 'mount' program has the 'retry' option in order to set a timeout
> for the mount operation itself. Is that option not working correctly?

Manjunath will need to confirm that, but my understanding is that
mount.nfs is not regaining control when the server returns DELAY
to CREATE_SESSION. My conclusion was that mount(2) is not returning.

> If so, we should definitely fix that.

My recollection is that mount.nfs polls, it does not set a timer
signal. So it will call mount(2) repeatedly until either "retry"
minutes has passed, or mount(2) succeeds. I don't think it will
deal with mount(2) not returning, but I could be wrong about that.

My preference would be to make the kernel more reliable (ie mount(2)
fails immediately in this case). That gives mount.nfs some time to
try other things (like, try the original mount again after a few
moments, or fall back to NFSv4.0, or fail).

We don't want mount.nfs to wait for the full retry= while doing
nothing else. That would make this particular failure mode behave
differently than all the other modes we have had, historically, IIUC.

Also, I agree with Bruce that the server should make CREATE_SESSION
less likely to fail. That would also benefit state recovery.

> We might also want to look into making it take values < 1 minute. That
> could be accomplished either by extending the syntax of the 'retry'
> option (e.g.: 'retry=<minutes>:<seconds>') or by adding a new option
> (e.g. 'sretry=<seconds>').
> 
> It would then be up to the caller of mount to decide the policy of what
> to do after a timeout.

I agree that the caller of mount(2) should be allowed to provide the
policy.

> Renegotiation downward to NFSv3 might be an
> option, but it's not something that most people want to do in the case
> where there are lots of clients competing for resources since that's
> precisely the regime where the NFSv3 DRC scheme breaks down (lots of
> disconnections, combined with a high turnover of DRC slots).

--
Chuck Lever
chucklever@xxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html