Re: [PATCH 2/2] mount: RPC_PROGNOTREGISTERED should not be a permanent error

Steve Dickson <SteveD@xxxxxxxxxx> · Wed, 23 Nov 2016 13:21:48 -0500



On 11/22/2016 05:43 PM, NeilBrown wrote:
> On Wed, Nov 23 2016, Steve Dickson wrote:
> 
>> [Resent due to mailman rejecting the HTML subpart]
> (and the resend included HTML too ... how embarrassing :-)
Yeah... :-) I guess an upgrade turned it on.. 

> 
>>
>> Hey Neil,
>>
>>
>> On 08/18/2016 09:45 PM, NeilBrown wrote:
>>> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
>>>
>>> changed the behaviour of "bg" mounts so that RPC_PROGNOTREGISTERED,
>>> which maps to EOPNOTSUPP, is not a permanent error.
>>> This useful because when an NFS server starts up there is a small window between
>>> the moment that rpcbind (or portmap) starts responding to lookup requests,
>>> and the moment when nfsd registers with rpcbind.  During that window
>>> rpcbind will reply with RPC_PROGNOTREGISTERED, but mount should not give up.
>>>
>>> This same reasoning applies to foreground mounts.  They don't wait for
>>> as long, but could still hit the window and fail prematurely.
>>>
>>> So revert the above patch and instead add EOPNOTSUPP to the list of
>>> temporary errors known to nfs_is_permanent_error.
>>>
>>> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>>> ---
>>>  utils/mount/stropts.c |    7 +++----
>>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
>>> index 9de6794c6177..d5dfb5e4a669 100644
>>> --- a/utils/mount/stropts.c
>>> +++ b/utils/mount/stropts.c
>>> @@ -948,6 +948,7 @@ static int nfs_is_permanent_error(int error)
>>>  	case ETIMEDOUT:
>>>  	case ECONNREFUSED:
>>>  	case EHOSTUNREACH:
>>> +	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
>> I think this introduced a regression... When the server does not support
>> a protocol, say UDP, this patch cause the mount to hang forever,
>> which I don't think we want.
> 
> 
> I think we do want it to wait a while so that the nfs server has a
> chance to start up.  We have no guarantee that the NFS server will be
> registered with rpcbind before rpcbind responds to requests.
I do see this race but there it has to be a small window. With
Fedora its under seconds between the time rpcbind started
and the NFS server.

> 
> I disagree with the "hang forever" description.  I just tested after
> disabling UDP on an nfs server, and the delay was 2 minutes, 5 seconds
> before a failure was reported.  It might be longer when trying TCP on a
> server that only supports UDP.
Yeah I did not wait that long... You are much more of a patient man than I ;-) 
I do think this is a regression. Going an from an instant failure to one
that takes over 2min is not a good thing... IMHO.

> 
> So I think the current behavior is correct.  You might be able to argue
> that certain error codes should trigger a shorter timeout, but it would
> need a strong argument.
Going with the theory the window is very small, how about 
a retry with a timeout then a failure? 

> 
> Or maybe you mean that a "bg" mount would "hang forever" in the
> background?  I think that behavior is correct too.
I agreed... "bg" mounts should hang longer than fg mounts
but they shouldn't for something that will never happen
like the non-support of a protocol.

steved.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html