Hi Ian, I'm getting crazy trying put "retry=" to work on mount... this option just DONT WORK if use proto=tcp and/OR kerberos (sec=krb5/krb5i/krb5p) like you can see on my previous emails... I appreciate any help. Carlos. 2009/8/12 Ian Kent <ikent@xxxxxxxxxx>: > Chuck Lever wrote: >> On Aug 11, 2009, at 8:41 AM, Carlos André wrote: >>> This long timeout is good if workstation need mount a critical >>> directory using /etc/fstab on boot (for example).. >>> But in my case, using this loooong timeout doesnt make any sense, >>> since autofs retry mount directory on-access. This in fact gives me >>> alot of headaches, coz user login 'll just hangs if one server goes >>> down for any reason, and will again hangs if user try access directory >>> pointing to a NFS down server... >> >> "retry=0" means the mount command will fail as soon as the first >> mount(2) system call fails. When you set SYN retries to 1, this means >> after 9 seconds, the connect fails, and that causes the mount(2) system >> call to fail. >> >> Recent conversations with Ian suggested that a long timeout was desired >> for automounter as well as other cases. Ian, is there something else we >> need to consider to determine the correct retry timeout for NFS/TCP >> mount points handled via automounter? How should mount.nfs wait so we >> don't make other use cases worse? (Looks like most of the history is >> intact below). > > Of course we know that autofs is entirely at the mercy of mount(8) (and > mount.nfs in particular). This has always been a difficult situation for > the automounter because interactive mount invocations should wait. But I > believe automount mounts should always time out quickly, but that leads > to its own set of problems, especially when home directories are concerned. > > I think adding "retry=0" is the right thing to do myself but I'm not > certain that will work as we expect. I'll have to do some experimentation. > >> >> How long do you think is appropriate for the automounter to wait if the >> server is down, in your case, Carlos? >> >>> Am losing something or there have was something weirdo...!? >>> ------------------------------------------------ >>> [root@KSTATION ~]# echo 5 > /proc/sys/net/ipv4/tcp_syn_retries [DEFAULT] >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> proto=tcp,retry=1 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 3m9.000s >>> user 0m0.002s >>> sys 0m0.001s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> sec=krb5p,proto=tcp,retry=1 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 3m9.000s >>> user 0m0.000s >>> sys 0m0.002s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> proto=tcp,retry=0 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 3m9.001s >>> user 0m0.000s >>> sys 0m0.003s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> sec=krb5p,proto=tcp,retry=0 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 3m9.001s >>> user 0m0.002s >>> sys 0m0.001s >>> >>> [root@KSTATION ~]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries [ 5 to 1 ] >>> >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> proto=tcp,retry=1 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 6] >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 1m3.002s >>> user 0m0.000s >>> sys 0m0.002s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> sec=krb5p,proto=tcp,retry=1 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13] >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 2m6.000s >>> user 0m0.000s >>> sys 0m0.002s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> proto=tcp,retry=0 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 0m9.003s >>> user 0m0.001s >>> sys 0m0.002s >>> [root@KSTATION ~]# time mount 1.2.3.4:/blabla /tmp/ -t nfs4 -o >>> sec=krb5p,proto=tcp,retry=0 >>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). [x 13] >>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>> >>> real 2m6.001s >>> user 0m0.001s >>> sys 0m0.002s >>> [root@KSTATION ~]# >>> ------------------------------------------------ >>> max timeout goes to 2m6s changing tcp_syn_retries from 5 to 1... and >>> using retry=0 without kerberos I got only 9s... >>> >>> *sigh* >>> >>> >>> >>> 2009/8/10 Chuck Lever <chuck.lever@xxxxxxxxxx>: >>>> On Aug 10, 2009, at 4:05 PM, Carlos André wrote: >>>>> >>>>> Something funny: Using default tcp_syn_retries (5) i got >>>>> "3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to >>>>> 1 i got "3,6,3,6,3,6..." secs interval... >>>> >>>> Right. Normally the RPC client calls the kernel's socket connect >>>> function, >>>> which does 6 SYN retries. That one call usually takes longer than >>>> the RPC >>>> client's connect timeout, so it only makes one connect call, and then >>>> fails. >>>> >>>> Reducing the number of SYN retries per connect attempt causes the RPC >>>> client >>>> to retry the connect call until its connect timeout expires. Each >>>> connect >>>> call resets the SYN timeout to 3 seconds. >>>> >>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o >>>>> sec=krb5p,proto=tcp >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>>>> >>>>> real 3m9.000s >>>>> user 0m0.000s >>>>> sys 0m0.002s >>>>> >>>>> [root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries >>>>> [root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o >>>>> sec=krb5p,proto=tcp ("retry=1" = no change) >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (retrying). >>>>> mount: mount to NFS server '1.2.3.4' failed: timed out (giving up). >>>>> >>>>> real 2m6.004s >>>>> user 0m0.000s >>>>> sys 0m0.004s >>>>> >>>>> (3,6,3,6... secs interval) >>>>> >>>>> >>>>> >>>>> >>>>> 2009/8/10 Carlos André <candrecn@xxxxxxxxx>: >>>>>> >>>>>> No, i'm just using packages from CentOS repo... >>>>>> >>>>>> And u're right about expo retries... with tcpdump i've monitored >>>>>> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port >>>>>> 2049... >>>>>> I tried use "retry=1" option on mount without any change... I dont >>>>>> want change source or tcp timers... just NFSv4 client. >>>>>> >>>>>> 2009/8/10 Chuck Lever <chuck.lever@xxxxxxxxxx>: >>>>>>> >>>>>>> On Aug 10, 2009, at 2:29 PM, Carlos André wrote: >>>>>>>> >>>>>>>> Bruce, no... you're right. I'm describing a situation where my >>>>>>>> server >>>>>>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes >>>>>>>> and 9 seconds... >>>>>>> >>>>>>> The 189 second timeout is likely how long it takes the kernel to >>>>>>> give up >>>>>>> trying to connect a TCP socket to the server (6 SYN attempts with >>>>>>> exponential retries, or something like that). For stock CentOS >>>>>>> 5.3, I >>>>>>> think >>>>>>> user space does only a DNS lookup for normal NFSv4 mounts -- the >>>>>>> kernel >>>>>>> just >>>>>>> tries to connect a TCP socket to port 2049, with no preceding rpcbind >>>>>>> request. >>>>>>> >>>>>>> Carlos, let us know if you have replaced any NFS-related CentOS >>>>>>> components >>>>>>> (kernel, nfs-utils) with something you've built yourself. >>>>>>> >>>>>>>> 2009/8/7 J. Bruce Fields <bfields@xxxxxxxxxxxx>: >>>>>>>>> >>>>>>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote: >>>>>>>>>> >>>>>>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos André <candrecn@xxxxxxxxx> >>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Anyone ? >>>>>>>>>>> >>>>>>>>>>> 2009/7/29 Carlos André <candrecn@xxxxxxxxx>: >>>>>>>>>>>> >>>>>>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with >>>>>>>>>>>> Kerberos >>>>>>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a >>>>>>>>>>>> LOOOOOOONG >>>>>>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client... >>>>>>>>>>>> >>>>>>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if >>>>>>>>>>>> mount >>>>>>>>>>>> hangs, >>>>>>>>>>>> user logon hangs. Then i want configure it to timeout (if server >>>>>>>>>>>> down) >>>>>>>>>>>> after >>>>>>>>>>>> 10-15 secs (MAX) on each mount attempt. >>>>>>>>>>>> >>>>>>>>>>>> I already make a lab and tried a LOT of combinations, there my >>>>>>>>>>>> findings >>>>>>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using >>>>>>>>>>>> basic >>>>>>>>>>>> command >>>>>>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o >>>>>>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client: >>>>>>>>>>>> >>>>>>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR >>>>>>>>>>>> proto=udp) >>>>>>>>>>>> it >>>>>>>>>>>> hangs for 189 secs (3m9s: real 3m9.001s) until show error >>>>>>>>>>>> (mount: >>>>>>>>>>>> mount to >>>>>>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up)) >>>>>>>>>> >>>>>>>>>> Sounds like you're hitting the server's grace period. >>>>>>>>> >>>>>>>>> I thought he was describing a situation where the server the server >>>>>>>>> is completely gone and isn't coming back, and wondering how to make >>>>>>>>> the >>>>>>>>> mount fail faster. But I may be misunderstanding. >>>>>>>>> >>>>>>>>> --b. >>>>>>>>> >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe >>>>>>>> linux-nfs" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> -- >>>>>>> Chuck Lever >>>>>>> chuck[dot]lever[at]oracle[dot]com >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>> >>>> -- >>>> Chuck Lever >>>> chuck[dot]lever[at]oracle[dot]com >>>> >>>> >>>> >>>> >> >> -- >> Chuck Lever >> chuck[dot]lever[at]oracle[dot]com >> >> >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html