Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos André <candrecn@xxxxxxxxx> · Mon, 24 Aug 2009 10:27:45 -0300

Hi Ian,

Thanks for patch and sorry for delay (i'm expecting receive u reply on
bug track, not here) :)

But, this patch doesnt worked to me like expected...  :(

Firstly I've changed "#MOUNT_WAIT=-1" to "MOUNT_WAIT=10"
and later changed "10" to "2" with same results...
(always restarting service, of course :)

Then, tried remove "sec=krb5p", and later removed "nfs4" but i got
same results again.

Or i'm doing something wrong?

[root@KSTATION areas]# automount -V

Linux automount version 5.0.1-0.rc2.131.bz517349.1
[...]

[root@KSTATION areas]# time ls -la testdown
ls: testedown: No such file or directory

real    3m9.006s
user    0m0.002s
sys     0m0.000s

LOGGING:
-----------------------------------------
Aug 24 09:23:51 KSTATION automount[20803]: mount_mount: mount(nfs):
calling mount -t nfs4 -s -o rw,acl,sec=krb5p 1.2.3.4:/areas/testdown
/misc/areas/testdown
Aug 24 09:27:00 KSTATION automount[20803]: mount(nfs): nfs: mount
failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
Aug 24 09:27:00 KSTATION automount[20803]: ioctl_send_fail: token = 91
Aug 24 09:27:00 KSTATION automount[20803]: failed to mount /misc/areas/testdown
-----------------------------------------

2009/8/17 Ian Kent <ikent@xxxxxxxxxx>:
> On Thu, 2009-08-13 at 12:18 -0300, Carlos André wrote:
>> Filled bug report:
>> https://bugzilla.redhat.com/show_bug.cgi?id=517349
>
> Hi Carlos,
>
> I have a patched source rpm to add a mount wait parameter to autofs
> located at:
> http://people.redhat.com/~ikent/autofs-5.0.1-0.rc2.131.bz517349.1
>
> Could you build it and see if it works.
> I haven't tested it at all but it is fairly straight forward.
> It is still unclear if this is the right way to do this and what the
> consequences are in sending a term signal to mount. This mount request
> will likely be followed by other requests for the same mount causing an
> accumulation of mount(8) processes waiting for RPC timeouts before they
> can answer the TERM signal.
>
> Anyway, for information the patch included in the source rpm above is:
>
> autofs-5.0.4 - add mount wait parameter
>
> From: Ian Kent <raven@xxxxxxxxxx>
>
> Often delays when trying to mount from a server that is not reponding
> for some reason are undesirable. To try and prevent these delays we
> provide a configuration setting to limit the time that we wait for
> our spawned mount(8) process to complete before sending it a SIGTERM
> signal. This patch adds a configuration parameter to allow us to
> request we limit the time we wait for mount(8) to complete before
> send it a TERM signal.
> ---
>
>  daemon/spawn.c                 |    3 ++-
>  include/defaults.h             |    2 ++
>  lib/defaults.c                 |   13 +++++++++++++
>  man/auto.master.5.in           |    7 +++++++
>  redhat/autofs.sysconfig.in     |    9 +++++++++
>  samples/autofs.conf.default.in |    9 +++++++++
>  6 files changed, 42 insertions(+), 1 deletion(-)
>
>
> --- autofs-5.0.1.orig/daemon/spawn.c
> +++ autofs-5.0.1/daemon/spawn.c
> @@ -312,6 +312,7 @@ int spawn_mount(unsigned logopt, ...)
>        unsigned int options;
>        unsigned int retries = MTAB_LOCK_RETRIES;
>        int update_mtab = 1, ret, printed = 0;
> +       unsigned int wait = defaults_get_mount_wait();
>        char buf[PATH_MAX];
>
>        /* If we use mount locking we can't validate the location */
> @@ -353,7 +354,7 @@ int spawn_mount(unsigned logopt, ...)
>        va_end(arg);
>
>        while (retries--) {
> -               ret = do_spawn(logopt, -1, options, prog, (const char **) argv);
> +               ret = do_spawn(logopt, wait, options, prog, (const char **) argv);
>                if (ret & MTAB_NOTUPDATED) {
>                        struct timespec tm = {3, 0};
>
> --- autofs-5.0.1.orig/include/defaults.h
> +++ autofs-5.0.1/include/defaults.h
> @@ -24,6 +24,7 @@
>
>  #define DEFAULT_TIMEOUT                        600
>  #define DEFAULT_NEGATIVE_TIMEOUT       60
> +#define DEFAULT_MOUNT_WAIT             -1
>  #define DEFAULT_UMOUNT_WAIT            12
>  #define DEFAULT_BROWSE_MODE            1
>  #define DEFAULT_LOGGING                        0
> @@ -62,6 +63,7 @@ struct ldap_schema *defaults_get_schema(
>  struct ldap_searchdn *defaults_get_searchdns(void);
>  void defaults_free_searchdns(struct ldap_searchdn *);
>  unsigned int defaults_get_append_options(void);
> +unsigned int defaults_get_mount_wait(void);
>  unsigned int defaults_get_umount_wait(void);
>  const char *defaults_get_auth_conf_file(void);
>  unsigned int defaults_get_map_hash_table_size(void);
> --- autofs-5.0.1.orig/lib/defaults.c
> +++ autofs-5.0.1/lib/defaults.c
> @@ -45,6 +45,7 @@
>  #define ENV_NAME_VALUE_ATTR            "VALUE_ATTRIBUTE"
>
>  #define ENV_APPEND_OPTIONS             "APPEND_OPTIONS"
> +#define ENV_MOUNT_WAIT                 "MOUNT_WAIT"
>  #define ENV_UMOUNT_WAIT                        "UMOUNT_WAIT"
>  #define ENV_AUTH_CONF_FILE             "AUTH_CONF_FILE"
>
> @@ -323,6 +324,7 @@ unsigned int defaults_read_config(unsign
>                    check_set_config_value(key, ENV_NAME_ENTRY_ATTR, value, to_syslog) ||
>                    check_set_config_value(key, ENV_NAME_VALUE_ATTR, value, to_syslog) ||
>                    check_set_config_value(key, ENV_APPEND_OPTIONS, value, to_syslog) ||
> +                   check_set_config_value(key, ENV_MOUNT_WAIT, value, to_syslog) ||
>                    check_set_config_value(key, ENV_UMOUNT_WAIT, value, to_syslog) ||
>                    check_set_config_value(key, ENV_AUTH_CONF_FILE, value, to_syslog) ||
>                    check_set_config_value(key, ENV_MAP_HASH_TABLE_SIZE, value, to_syslog))
> @@ -652,6 +654,17 @@ unsigned int defaults_get_append_options
>        return res;
>  }
>
> +unsigned int defaults_get_mount_wait(void)
> +{
> +       long wait;
> +
> +       wait = get_env_number(ENV_MOUNT_WAIT);
> +       if (wait < 0)
> +               wait = DEFAULT_MOUNT_WAIT;
> +
> +       return (unsigned int) wait;
> +}
> +
>  unsigned int defaults_get_umount_wait(void)
>  {
>        long wait;
> --- autofs-5.0.1.orig/man/auto.master.5.in
> +++ autofs-5.0.1/man/auto.master.5.in
> @@ -175,6 +175,13 @@ Set the default timeout for caching fail
>  60). If the equivalent command line option is given it will override this
>  setting.
>  .TP
> +.B MOUNT_WAIT
> +Set the default time to wait for a response from a spawned mount(8)
> +before sending it a SIGTERM. Note that we still need to wait for the
> +RPC layer to timeout before the sub-process exits so this isn't ideal
> +but it is the best we can do. The default is to wait until mount(8)
> +returns without intervention.
> +.TP
>  .B UMOUNT_WAIT
>  Set the default time to wait for a response from a spawned umount(8)
>  before sending it a SIGTERM. Note that we still need to wait for the
> --- autofs-5.0.1.orig/redhat/autofs.sysconfig.in
> +++ autofs-5.0.1/redhat/autofs.sysconfig.in
> @@ -14,6 +14,15 @@ TIMEOUT=300
>  #
>  #NEGATIVE_TIMEOUT=60
>  #
> +# MOUNT_WAIT - time to wait for a response from umount(8).
> +#             Setting this timeout can cause problems when
> +#             mount would otherwise wait for a server that
> +#             is temporarily unavailable, such as when it's
> +#             restarting. The defailt of waiting for mount(8)
> +#             usually results in a wait of around 3 minutes.
> +#
> +#MOUNT_WAIT=-1
> +#
>  # UMOUNT_WAIT - time to wait for a response from umount(8).
>  #
>  #UMOUNT_WAIT=12
> --- autofs-5.0.1.orig/samples/autofs.conf.default.in
> +++ autofs-5.0.1/samples/autofs.conf.default.in
> @@ -14,6 +14,15 @@ TIMEOUT=300
>  #
>  #NEGATIVE_TIMEOUT=60
>  #
> +# MOUNT_WAIT - time to wait for a response from umount(8).
> +#             Setting this timeout can cause problems when
> +#             mount would otherwise wait for a server that
> +#             is temporarily unavailable, such as when it's
> +#             restarting. The defailt of waiting for mount(8)
> +#             usually results in a wait of around 3 minutes.
> +#
> +#MOUNT_WAIT=-1
> +#
>  # UMOUNT_WAIT - time to wait for a response from umount(8).
>  #
>  #UMOUNT_WAIT=12
>
>
>>
>> Thanks!
>>
>> 2009/8/13 Carlos André <candrecn@xxxxxxxxx>:
>> > 2009/8/13 Ian Kent <ikent@xxxxxxxxxx>:
>> >> Carlos André wrote:
>> >>> Today (2009-08-12) I'm using:
>> >>> kernel-2.6.18-128.2.1.el5
>> >>> autofs-5.0.1-0.rc2.102.el5_3.1
>> >>
>> >> Thanks,
>> >>
>> >> My mistake, the wait time I was referring to is used for umounts during
>> >> expires and is present in rev rc2.102.
>> >>
>> >> It shouldn't be hard to add this for mount as well.
>> >> Would you like me to put something together?
>> >
>> > Sure! that 'll help me a lot (and for sure another ppl) :) Thanks :)
>> >
>> >>
>> >> Probably would be good to test something out to see if we can make a
>> >> difference with the killing mount after some configured timeout but, if
>> >> we make progress, probably the best way to deal with it is for you to
>> >> log a bug against rhel-5 so I can get it committed to the rhel package.
>> >> The possible issue is that I'm not sure if the RPC subsystem in the
>> >> above rhel kernel will respond well to process death with potential
>> >> outstanding requests. But we'll see.
>> >
>> > Ok, on my way :)
>> >
>> > Thanks a lot!
>> >
>> >>
>> >>>
>> >>>
>> >>> Look my last test:
>> >>> --------------------------------------------------------------
>> >>> [root@KSTATION areas]# time ls testdown
>> >>> ls: testdown: No such file or directory
>> >>>
>> >>> real    3m9.025s
>> >>> user    0m0.000s
>> >>> sys     0m0.002s
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: sun_mount: parse(sun):
>> >>> mounting root /misc/areas, mountpoint testdown, what
>> >>> 1.2.3.4:/areas/testdown, fstype nfs4, options
>> >>> acl,sec=krb5p,proto=tcp,retry=0
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: do_mount:
>> >>> 1.2.3.4:/areas/testdown /misc/areas/testdown type nfs4 options
>> >>> acl,sec=krb5p,proto=tcp,retry=0 using module nfs4
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> >>> root=/misc/areas name=testdown what=1.2.3.4:/areas/testdown,
>> >>> fstype=nfs4, options=acl,sec=krb5p,proto=tcp,retry=0
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> >>> nfs options="acl,sec=krb5p,proto=tcp,retry=0", nosymlink=0, ro=0
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> >>> calling mkdir_path /misc/areas/testdown
>> >>> Aug 12 12:57:07 KSTATION automount[15471]: mount_mount: mount(nfs):
>> >>> calling mount -t nfs4 -s -o acl,sec=krb5p,proto=tcp,retry=0
>> >>> 1.2.3.4:/areas/testdown /misc/areas/testdown
>> >>> Aug 12 12:58:12 KSTATION automount[15471]: st_expire: state 1 path /misc
>> >>> Aug 12 12:58:12 KSTATION automount[15471]: expire_proc: exp_proc =
>> >>> 3078093712 path /misc
>> >>> Aug 12 12:58:13 KSTATION automount[15471]: expire_proc_indirect: 2
>> >>> submounts remaining in /misc
>> >>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: got thid
>> >>> 3078093712 path /misc stat 3
>> >>> Aug 12 12:58:13 KSTATION automount[15471]: expire_cleanup: sigchld:
>> >>> exp 3078093712 finished, switching from 2 to 1
>> >>> Aug 12 12:58:13 KSTATION automount[15471]: st_ready: st_ready(): state
>> >>> = 2 path /misc
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: st_expire: state 1 path /misc
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc: exp_proc =
>> >>> 3078093712 path /misc
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: expire_proc_indirect: 2
>> >>> submounts remaining in /misc
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: got thid
>> >>> 3078093712 path /misc stat 3
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: expire_cleanup: sigchld:
>> >>> exp 3078093712 finished, switching from 2 to 1
>> >>> Aug 12 12:59:28 KSTATION automount[15471]: st_ready: st_ready(): state
>> >>> = 2 path /misc
>> >>> Aug 12 13:00:16 KSTATION automount[15471]: >> mount: mount to NFS
>> >>> server '1.2.3.4' failed: timed out (giving up).
>> >>> Aug 12 13:00:16 KSTATION automount[15471]: mount(nfs): nfs: mount
>> >>> failure 1.2.3.4:/areas/testdown on /misc/areas/testdown
>> >>> Aug 12 13:00:16 KSTATION automount[15471]: send_fail: token = 17
>> >>> Aug 12 13:00:16 KSTATION automount[15471]: failed to mount /misc/areas/testdown
>> >>> Aug 12 13:00:43 KSTATION automount[15471]: st_expire: state 1 path /misc
>> >>> --------------------------------------------------------------
>> >>>
>> >>> 2009/8/12 Ian Kent <ikent@xxxxxxxxxx>:
>> >>>> Carlos André wrote:
>> >>>>> Hi Ian,
>> >>>>> I'm getting crazy trying put "retry=" to work on mount... this option
>> >>>>> just DONT WORK if use proto=tcp and/OR kerberos (sec=krb5/krb5i/krb5p)
>> >>>>> like you can see on my previous emails...
>> >>>> Right, my mistake for not looking closely enough at post.
>> >>>>
>> >>>> Maybe this is related to the same sort of problem we had with mount in
>> >>>> the past, before the options parsing went into the kernel, where other
>> >>>> services, like portmapper (or rpcbind), were being done with different
>> >>>> timeout parameters before the RPC calls for mounting. That's just an
>> >>>> example as NFSv4 shouldn't be sensitive to portmapper anyway.
>> >>>>
>> >>>> But what version of autofs and kernel did you say you were using?
>> >>>>
>> >>>>> I appreciate any help.
>> >>>>>
>> >>>>> Carlos.
>> >>>>>
>> >>>>>
>> >>>>> 2009/8/12 Ian Kent <ikent@xxxxxxxxxx>:
>> >>>>>> Chuck Lever wrote:
>> >>>>>>> On Aug 11, 2009, at 8:41 AM, Carlos André wrote:
>> >>>>>>>> This long timeout is good if workstation need mount a critical
>> >>>>>>>> directory using /etc/fstab on boot (for example)..
>> >>>>>>>> But in my case, using this loooong timeout doesnt make any sense,
>> >>>>>>>> since autofs retry mount directory on-access. This in fact gives me
>> >>>>>>>> alot of headaches, coz user login 'll just hangs if one server goes
>> >>>>>>>> down for any reason, and will again hangs if user try access directory
>> >>>>>>>> pointing to a NFS down server...
>> >>>>>>> "retry=0" means the mount command will fail as soon as the first
>> >>>>>>> mount(2) system call fails.  When you set SYN retries to 1, this means
>> >>>>>>> after 9 seconds, the connect fails, and that causes the mount(2) system
>> >>>>>>> call to fail.
>> >>>>>>>
>> >>>>>>> Recent conversations with Ian suggested that a long timeout was desired
>> >>>>>>> for automounter as well as other cases.  Ian, is there something else we
>> >>>>>>> need to consider to determine the correct retry timeout for NFS/TCP
>> >>>>>>> mount points handled via automounter?  How should mount.nfs wait so we
>> >>>>>>> don't make other use cases worse?  (Looks like most of the history is
>> >>>>>>> intact below).
>> >>>>>> Of course we know that autofs is entirely at the mercy of mount(8) (and
>> >>>>>> mount.nfs in particular). This has always been a difficult situation for
>> >>>>>> the automounter because interactive mount invocations should wait. But I
>> >>>>>> believe automount mounts should always time out quickly, but that leads
>> >>>>>> to its own set of problems, especially when home directories are concerned.
>> >>>>>>
>> >>>>>> I think adding "retry=0" is the right thing to do myself but I'm not
>> >>>>>> certain that will work as we expect. I'll have to do some experimentation.
>> >>>>>>
>> >>>>>>> How long do you think is appropriate for the automounter to wait if the
>> >>>>>>> server is down, in your case, Carlos?
>> >>>>>>>
>> >>>>>>>> Am losing something or there have was something weirdo...!?
>> >>>>>>>> ------------------------------------------------
>> >>>>>>>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries  [DEFAULT]
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> proto=tcp,retry=1
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    3m9.000s
>> >>>>>>>> user    0m0.002s
>> >>>>>>>> sys     0m0.001s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> sec=krb5p,proto=tcp,retry=1
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    3m9.000s
>> >>>>>>>> user    0m0.000s
>> >>>>>>>> sys     0m0.002s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> proto=tcp,retry=0
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    3m9.001s
>> >>>>>>>> user    0m0.000s
>> >>>>>>>> sys     0m0.003s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> sec=krb5p,proto=tcp,retry=0
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    3m9.001s
>> >>>>>>>> user    0m0.002s
>> >>>>>>>> sys     0m0.001s
>> >>>>>>>>
>> >>>>>>>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 ]
>> >>>>>>>>
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> proto=tcp,retry=1
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6]
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    1m3.002s
>> >>>>>>>> user    0m0.000s
>> >>>>>>>> sys     0m0.002s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> sec=krb5p,proto=tcp,retry=1
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    2m6.000s
>> >>>>>>>> user    0m0.000s
>> >>>>>>>> sys     0m0.002s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> proto=tcp,retry=0
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    0m9.003s
>> >>>>>>>> user    0m0.001s
>> >>>>>>>> sys     0m0.002s
>> >>>>>>>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o
>> >>>>>>>> sec=krb5p,proto=tcp,retry=0
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13]
>> >>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>
>> >>>>>>>> real    2m6.001s
>> >>>>>>>> user    0m0.001s
>> >>>>>>>> sys     0m0.002s
>> >>>>>>>> [root@KSTATION ~]#
>> >>>>>>>> ------------------------------------------------
>> >>>>>>>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and
>> >>>>>>>> using retry=0 without kerberos I got only 9s...
>> >>>>>>>>
>> >>>>>>>> *sigh*
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> 2009/8/10 Chuck Lever <chuck.lever@xxxxxxxxxx>:
>> >>>>>>>>> On Aug 10, 2009, at 4:05 PM, Carlos André wrote:
>> >>>>>>>>>> Something funny: Using default tcp_syn_retries (5) i got
>> >>>>>>>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
>> >>>>>>>>>> 1 i got "3,6,3,6,3,6..." secs interval...
>> >>>>>>>>> Right.  Normally the RPC client calls the kernel's socket connect
>> >>>>>>>>> function,
>> >>>>>>>>> which does 6 SYN retries.  That one call usually takes longer than
>> >>>>>>>>> the RPC
>> >>>>>>>>> client's connect timeout, so it only makes one connect call, and then
>> >>>>>>>>> fails.
>> >>>>>>>>>
>> >>>>>>>>> Reducing the number of SYN retries per connect attempt causes the RPC
>> >>>>>>>>> client
>> >>>>>>>>> to retry the connect call until its connect timeout expires.  Each
>> >>>>>>>>> connect
>> >>>>>>>>> call resets the SYN timeout to 3 seconds.
>> >>>>>>>>>
>> >>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>> >>>>>>>>>> sec=krb5p,proto=tcp
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>>>
>> >>>>>>>>>> real    3m9.000s
>> >>>>>>>>>> user    0m0.000s
>> >>>>>>>>>> sys     0m0.002s
>> >>>>>>>>>>
>> >>>>>>>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
>> >>>>>>>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
>> >>>>>>>>>> sec=krb5p,proto=tcp  ("retry=1" = no change)
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
>> >>>>>>>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).
>> >>>>>>>>>>
>> >>>>>>>>>> real    2m6.004s
>> >>>>>>>>>> user    0m0.000s
>> >>>>>>>>>> sys     0m0.004s
>> >>>>>>>>>>
>> >>>>>>>>>> (3,6,3,6... secs interval)
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2009/8/10 Carlos André <candrecn@xxxxxxxxx>:
>> >>>>>>>>>>> No, i'm just using packages from CentOS repo...
>> >>>>>>>>>>>
>> >>>>>>>>>>> And u're right about expo retries... with tcpdump i've monitored
>> >>>>>>>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
>> >>>>>>>>>>> 2049...
>> >>>>>>>>>>> I tried use "retry=1" option on mount without any change... I dont
>> >>>>>>>>>>> want change source or tcp timers... just NFSv4 client.
>> >>>>>>>>>>>
>> >>>>>>>>>>> 2009/8/10 Chuck Lever <chuck.lever@xxxxxxxxxx>:
>> >>>>>>>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos André wrote:
>> >>>>>>>>>>>>> Bruce, no... you're right.  I'm describing a situation where my
>> >>>>>>>>>>>>> server
>> >>>>>>>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>> >>>>>>>>>>>>> and 9 seconds...
>> >>>>>>>>>>>> The 189 second timeout is likely how long it takes the kernel to
>> >>>>>>>>>>>> give up
>> >>>>>>>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with
>> >>>>>>>>>>>> exponential retries, or something like that).  For stock CentOS
>> >>>>>>>>>>>> 5.3, I
>> >>>>>>>>>>>> think
>> >>>>>>>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the
>> >>>>>>>>>>>> kernel
>> >>>>>>>>>>>> just
>> >>>>>>>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>> >>>>>>>>>>>> request.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS
>> >>>>>>>>>>>> components
>> >>>>>>>>>>>> (kernel, nfs-utils) with something you've built yourself.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>> 2009/8/7 J. Bruce Fields <bfields@xxxxxxxxxxxx>:
>> >>>>>>>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>> >>>>>>>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos André <candrecn@xxxxxxxxx>
>> >>>>>>>>>>>>>>> wrote:
>> >>>>>>>>>>>>>>>> Anyone ?
>> >>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>> 2009/7/29 Carlos André <candrecn@xxxxxxxxx>:
>> >>>>>>>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>> >>>>>>>>>>>>>>>>> Kerberos
>> >>>>>>>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>> >>>>>>>>>>>>>>>>> LOOOOOOONG
>> >>>>>>>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if
>> >>>>>>>>>>>>>>>>> mount
>> >>>>>>>>>>>>>>>>> hangs,
>> >>>>>>>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if server
>> >>>>>>>>>>>>>>>>> down)
>> >>>>>>>>>>>>>>>>> after
>> >>>>>>>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt.
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my
>> >>>>>>>>>>>>>>>>> findings
>> >>>>>>>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using
>> >>>>>>>>>>>>>>>>> basic
>> >>>>>>>>>>>>>>>>> command
>> >>>>>>>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>> >>>>>>>>>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>> >>>>>>>>>>>>>>>>>
>> >>>>>>>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR
>> >>>>>>>>>>>>>>>>> proto=udp)
>> >>>>>>>>>>>>>>>>> it
>> >>>>>>>>>>>>>>>>> hangs for 189 secs (3m9s: real  3m9.001s)  until show error
>> >>>>>>>>>>>>>>>>> (mount:
>> >>>>>>>>>>>>>>>>> mount to
>> >>>>>>>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>> >>>>>>>>>>>>>>> Sounds like you're hitting the server's grace period.
>> >>>>>>>>>>>>>> I thought he was describing a situation where the server the server
>> >>>>>>>>>>>>>> is completely gone and isn't coming back, and wondering how to make
>> >>>>>>>>>>>>>> the
>> >>>>>>>>>>>>>> mount fail faster.  But I may be misunderstanding.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>>> --b.
>> >>>>>>>>>>>>>>
>> >>>>>>>>>>>>> --
>> >>>>>>>>>>>>> To unsubscribe from this list: send the line "unsubscribe
>> >>>>>>>>>>>>> linux-nfs" in
>> >>>>>>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> >>>>>>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >>>>>>>>>>>> --
>> >>>>>>>>>>>> Chuck Lever
>> >>>>>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>>>>>
>> >>>>>>>>> --
>> >>>>>>>>> Chuck Lever
>> >>>>>>>>> chuck[dot]lever[at]oracle[dot]com
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>> --
>> >>>>>>> Chuck Lever
>> >>>>>>> chuck[dot]lever[at]oracle[dot]com
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>
>> >>
>> >>
>> >
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html