Re: [PATCH 2/2] mount: RPC_PROGNOTREGISTERED should not be a permanent error

NeilBrown <neilb@xxxxxxxx> · Wed, 23 Nov 2016 09:43:34 +1100

On Wed, Nov 23 2016, Steve Dickson wrote:

> [Resent due to mailman rejecting the HTML subpart]
(and the resend included HTML too ... how embarrassing :-)

>
> Hey Neil,
>
>
> On 08/18/2016 09:45 PM, NeilBrown wrote:
>> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.")
>>
>> changed the behaviour of "bg" mounts so that RPC_PROGNOTREGISTERED,
>> which maps to EOPNOTSUPP, is not a permanent error.
>> This useful because when an NFS server starts up there is a small window between
>> the moment that rpcbind (or portmap) starts responding to lookup requests,
>> and the moment when nfsd registers with rpcbind.  During that window
>> rpcbind will reply with RPC_PROGNOTREGISTERED, but mount should not give up.
>>
>> This same reasoning applies to foreground mounts.  They don't wait for
>> as long, but could still hit the window and fail prematurely.
>>
>> So revert the above patch and instead add EOPNOTSUPP to the list of
>> temporary errors known to nfs_is_permanent_error.
>>
>> Signed-off-by: NeilBrown <neilb@xxxxxxxx>
>> ---
>>  utils/mount/stropts.c |    7 +++----
>>  1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c
>> index 9de6794c6177..d5dfb5e4a669 100644
>> --- a/utils/mount/stropts.c
>> +++ b/utils/mount/stropts.c
>> @@ -948,6 +948,7 @@ static int nfs_is_permanent_error(int error)
>>  	case ETIMEDOUT:
>>  	case ECONNREFUSED:
>>  	case EHOSTUNREACH:
>> +	case EOPNOTSUPP:	/* aka RPC_PROGNOTREGISTERED */
> I think this introduced a regression... When the server does not support
> a protocol, say UDP, this patch cause the mount to hang forever,
> which I don't think we want.

I think we do want it to wait a while so that the nfs server has a
chance to start up.  We have no guarantee that the NFS server will be
registered with rpcbind before rpcbind responds to requests.

I disagree with the "hang forever" description.  I just tested after
disabling UDP on an nfs server, and the delay was 2 minutes, 5 seconds
before a failure was reported.  It might be longer when trying TCP on a
server that only supports UDP.

So I think the current behavior is correct.  You might be able to argue
that certain error codes should trigger a shorter timeout, but it would
need a strong argument.

Or maybe you mean that a "bg" mount would "hang forever" in the
background?  I think that behavior is correct too.

Thanks,
NeilBrown
Attachment:
signature.asc

Description: PGP signature