Re: AutoFS+NFSv4 server down = LOOOOONG timeout.

Carlos André <candrecn@xxxxxxxxx> · Mon, 10 Aug 2009 17:05:10 -0300

Something funny: Using default tcp_syn_retries (5) i got
"3,6,12,24,48,96" secs interval... but if i change tcp_syn_retries to
1 i got "3,6,3,6,3,6..." secs interval...

[root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
sec=krb5p,proto=tcp
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real    3m9.000s
user    0m0.000s
sys     0m0.002s

[root@KSERVER /]# echo 1 > /proc/sys/net/ipv4/tcp_syn_retries
[root@KSERVER mnt]# time mount 1.2.3.4:/blabla tmp/ -t nfs4 -o
sec=krb5p,proto=tcp  ("retry=1" = no change)
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (retrying).
mount: mount to NFS server '1.2.3.4' failed: timed out (giving up).

real    2m6.004s
user    0m0.000s
sys     0m0.004s

(3,6,3,6... secs interval)

2009/8/10 Carlos André <candrecn@xxxxxxxxx>:
> No, i'm just using packages from CentOS repo...
>
> And u're right about expo retries... with tcpdump i've monitored
> traffic and i got SYN retries in 3, 6, 12, 24, 48, 96 secs on port
> 2049...
> I tried use "retry=1" option on mount without any change... I dont
> want change source or tcp timers... just NFSv4 client.
>
> 2009/8/10 Chuck Lever <chuck.lever@xxxxxxxxxx>:
>> On Aug 10, 2009, at 2:29 PM, Carlos André wrote:
>>>
>>> Bruce, no... you're right.  I'm describing a situation where my server
>>> died... i need mount fail faster (10 or 15 secs max) than 3 minutes
>>> and 9 seconds...
>>
>> The 189 second timeout is likely how long it takes the kernel to give up
>> trying to connect a TCP socket to the server (6 SYN attempts with
>> exponential retries, or something like that).  For stock CentOS 5.3, I think
>> user space does only a DNS lookup for normal NFSv4 mounts -- the kernel just
>> tries to connect a TCP socket to port 2049, with no preceding rpcbind
>> request.
>>
>> Carlos, let us know if you have replaced any NFS-related CentOS components
>> (kernel, nfs-utils) with something you've built yourself.
>>
>>> 2009/8/7 J. Bruce Fields <bfields@xxxxxxxxxxxx>:
>>>>
>>>> On Fri, Aug 07, 2009 at 09:42:18AM +0300, Benny Halevy wrote:
>>>>>
>>>>> On Aug. 07, 2009, 3:18 +0300, Carlos André <candrecn@xxxxxxxxx> wrote:
>>>>>>
>>>>>> Anyone ?
>>>>>>
>>>>>> 2009/7/29 Carlos André <candrecn@xxxxxxxxx>:
>>>>>>>
>>>>>>> PPL, I need put a CentOS 5.3 (updated) NFSv4 server to work with
>>>>>>> Kerberos
>>>>>>> and AutoFS, but i got a problem: If NFS server goes down i get a
>>>>>>> LOOOOOOONG
>>>>>>> mount timeout on CentOS 5.3 (updated) NFSv4 client...
>>>>>>>
>>>>>>> Since i need mount some (3 to 6) dirs at user logon process, if mount
>>>>>>> hangs,
>>>>>>> user logon hangs. Then i want configure it to timeout (if server down)
>>>>>>> after
>>>>>>> 10-15 secs (MAX) on each mount attempt.
>>>>>>>
>>>>>>> I already make a lab and tried a LOT of combinations, there my
>>>>>>> findings
>>>>>>> (server DOWN IP: 172.16.0.10 / client IP: 172.16.1.10) using basic
>>>>>>> command
>>>>>>> (time mount 172.16.0.10:/remotedir /localdir/ -t nfs4 -o
>>>>>>> sec=krb5,proto=<tcp/udp>) from NFS client:
>>>>>>>
>>>>>>> - Once i try access mount point using AutoFS (proto=tcp OR proto=udp)
>>>>>>> it
>>>>>>> hangs for 189 secs (3m9s: real  3m9.001s)  until show error (mount:
>>>>>>> mount to
>>>>>>> NFS server '172.16.0.10' failed: timed out (giving up))
>>>>>
>>>>> Sounds like you're hitting the server's grace period.
>>>>
>>>> I thought he was describing a situation where the server the server
>>>> is completely gone and isn't coming back, and wondering how to make the
>>>> mount fail faster.  But I may be misunderstanding.
>>>>
>>>> --b.
>>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>> Chuck Lever
>> chuck[dot]lever[at]oracle[dot]com
>>
>>
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html