On 9 Jun 2021, at 1:31, Michael Wakabayashi wrote:
Hi Olga,
There seems to be a discrepancy between what you're seeing and what
we're seeing.
So we were wondering if you can you please run these commands in your
Linux environment and paste the output of the mount command below?
$ sudo mkdir -p /tmp/mnt.dead
$ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
We'd like the mount command to specifically use "2.2.2.2:/fake_path"
since we know it is unreachable and outside your subnet.
We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to
reproduce the same behavior that we're seeing.
Also, if possible, a packet trace would be helpful:
$ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049
On my Ubuntu VirtualMachine, I see this output:
ubuntu@mikes-ubuntu-21-04:~$ time sudo mount -o vers=4 -vvv
2.2.2.2:/fake_path /tmp/mnt.dead
mount.nfs: timeout set for Wed Jun 9 05:12:15 2021
mount.nfs: trying text-based options
'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
mount.nfs: mount(2): Connection timed out
mount.nfs: Connection timed out
real 3m1.257s
user 0m0.006s
sys 0m0.007s
Thanks, Mike
It looks to me like you and Olga are seeing the same thing, a wait
through SYN retries scaling up from initial RTO for the number of
tcp_syn_retries.
It's not disputed that mounts waiting on the transport layer will block
other mounts.
It might be able to be changed: there's this torch:
https://lore.kernel.org/linux-nfs/87378omld4.fsf@xxxxxxxxxxxxxxxxxxxxxxxx/
..or there may be another way we don't have to wait ..
.. or tune tcp_syn_retries.. or RTO.. or something else (eBPF?).
I think we're all strapped for time and problems like this usually get
fixed by the folks feeling the most pain from them.
Ben