On Thu, Nov 24 2016, Steve Dickson wrote: > On 11/22/2016 05:43 PM, NeilBrown wrote: >> On Wed, Nov 23 2016, Steve Dickson wrote: >> >>> [Resent due to mailman rejecting the HTML subpart] >> (and the resend included HTML too ... how embarrassing :-) > Yeah... :-) I guess an upgrade turned it on.. > >> >>> >>> Hey Neil, >>> >>> >>> On 08/18/2016 09:45 PM, NeilBrown wrote: >>>> Commit: bf66c9facb8e ("mounts.nfs: v2 and v3 background mounts should retry when server is down.") >>>> >>>> changed the behaviour of "bg" mounts so that RPC_PROGNOTREGISTERED, >>>> which maps to EOPNOTSUPP, is not a permanent error. >>>> This useful because when an NFS server starts up there is a small window between >>>> the moment that rpcbind (or portmap) starts responding to lookup requests, >>>> and the moment when nfsd registers with rpcbind. During that window >>>> rpcbind will reply with RPC_PROGNOTREGISTERED, but mount should not give up. >>>> >>>> This same reasoning applies to foreground mounts. They don't wait for >>>> as long, but could still hit the window and fail prematurely. >>>> >>>> So revert the above patch and instead add EOPNOTSUPP to the list of >>>> temporary errors known to nfs_is_permanent_error. >>>> >>>> Signed-off-by: NeilBrown <neilb@xxxxxxxx> >>>> --- >>>> utils/mount/stropts.c | 7 +++---- >>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c >>>> index 9de6794c6177..d5dfb5e4a669 100644 >>>> --- a/utils/mount/stropts.c >>>> +++ b/utils/mount/stropts.c >>>> @@ -948,6 +948,7 @@ static int nfs_is_permanent_error(int error) >>>> case ETIMEDOUT: >>>> case ECONNREFUSED: >>>> case EHOSTUNREACH: >>>> + case EOPNOTSUPP: /* aka RPC_PROGNOTREGISTERED */ >>> I think this introduced a regression... When the server does not support >>> a protocol, say UDP, this patch cause the mount to hang forever, >>> which I don't think we want. >> >> >> I think we do want it to wait a while so that the nfs server has a >> chance to start up. We have no guarantee that the NFS server will be >> registered with rpcbind before rpcbind responds to requests. > I do see this race but there it has to be a small window. With > Fedora its under seconds between the time rpcbind started > and the NFS server. > >> >> I disagree with the "hang forever" description. I just tested after >> disabling UDP on an nfs server, and the delay was 2 minutes, 5 seconds >> before a failure was reported. It might be longer when trying TCP on a >> server that only supports UDP. > Yeah I did not wait that long... You are much more of a patient man than I ;-) > I do think this is a regression. Going an from an instant failure to one > that takes over 2min is not a good thing... IMHO. > >> >> So I think the current behavior is correct. You might be able to argue >> that certain error codes should trigger a shorter timeout, but it would >> need a strong argument. > Going with the theory the window is very small, how about > a retry with a timeout then a failure? I started looking at changing the timeout and it wouldn't be too hard (if we can agree on a suitable delay), but I feel I must ask why this is important. In what situation are you likely to mount with the wrong protocol, that you aren't able to just Ctrl-C when you realized what a dumb thing you just did? If rpcbind isn't running, which is arguably a very similar situation (no protocols are register) we have always had a long timeout. Why is "just one protocol not registered" any different? Anyway, below is the patch I was working on. I stopped when I wasn't sure how to handle ECONNREFUSED. NeilBrown diff --git a/utils/mount/stropts.c b/utils/mount/stropts.c index d5dfb5e4a669..084776115b9f 100644 --- a/utils/mount/stropts.c +++ b/utils/mount/stropts.c @@ -935,24 +935,30 @@ static int nfs_try_mount(struct nfsmount_info *mi) * failed so far, but fail immediately if there is a local * error (like a bad mount option). * - * ESTALE is also a temporary error because some servers - * return ESTALE when a share is temporarily offline. + * If there is a remote error, like ESTALE or RPC_PROGNOTREGISTERED + * then it is probably permanent, but there is a small chance + * the it is temporary can we caught the server at an awkward + * time during start-up. A shorter timeout is best for such + * circumstances, so return a distinct status. * - * Returns 1 if we should fail immediately, or 0 if we - * should retry. + * Returns PERMANENT if we should fail immediately, + * TEMPORARY if we should retry normally, or + * REMOTE if we should retry with shorter timeout. */ -static int nfs_is_permanent_error(int error) +enum error_type { PERMANENT, TEMPORARY, REMOTE }; +static enum error_type nfs_error_type(int error) { switch (error) { case ESTALE: + case EOPNOTSUPP: /* aka RPC_PROGNOTREGISTERED */ + return REMOTE; case ETIMEDOUT: case ECONNREFUSED: case EHOSTUNREACH: - case EOPNOTSUPP: /* aka RPC_PROGNOTREGISTERED */ case EAGAIN: - return 0; /* temporary */ + return TEMPORARY; default: - return 1; /* permanent */ + return PERMANENT; } } @@ -967,6 +973,7 @@ static int nfsmount_fg(struct nfsmount_info *mi) { unsigned int secs = 1; time_t timeout; + int last_errno = 0; timeout = nfs_parse_retry_option(mi->options, NFS_DEF_FG_TIMEOUT_MINUTES); @@ -987,13 +994,22 @@ static int nfsmount_fg(struct nfsmount_info *mi) */ return EX_SUCCESS; - if (nfs_is_permanent_error(errno)) + switch(nfs_error_type(errno)) { + case PERMANENT: + timeout = 0; break; - - if (time(NULL) > timeout) { + case REMOTE: + if (errno == last_errno) + timeout = 0; + break; + case TEMPORARY: errno = ETIMEDOUT; break; } + last_errno = errno; + + if (time(NULL) > timeout) + break; if (errno != ETIMEDOUT) { if (sleep(secs)) @@ -1020,7 +1036,7 @@ static int nfsmount_parent(struct nfsmount_info *mi) if (nfs_try_mount(mi)) return EX_SUCCESS; - if (nfs_is_permanent_error(errno)) { + if (nfs_error_type(errno) == PERMANENT) { mount_error(mi->spec, mi->node, errno); return EX_FAIL; } @@ -1055,8 +1071,14 @@ static int nfsmount_child(struct nfsmount_info *mi) if (nfs_try_mount(mi)) return EX_SUCCESS; - if (nfs_is_permanent_error(errno)) + switch (nfs_error_type(errno)) { + case REMOTE: /* Doesn't hurt to keep trying remote errors + * when in the background + */ + case PERMANENT: + timeout = 0; break; + } if (time(NULL) > timeout) break;
Attachment:
signature.asc
Description: PGP signature