Re: NFSv4: Mounting NFS server which is down, blocks all other NFS mounts on same machine

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jun 9, 2021 at 10:31 AM Benjamin Coddington <bcodding@xxxxxxxxxx> wrote:
>
> On 9 Jun 2021, at 1:31, Michael Wakabayashi wrote:
>
> > Hi Olga,
> >
> > There seems to be a discrepancy between what you're seeing and what
> > we're seeing.
> >
> > So we were wondering if you can you please run these commands in your
> > Linux environment and paste the output of the mount command below?
> >     $ sudo mkdir -p /tmp/mnt.dead
> >     $ time sudo mount -o vers=4 -vvv 2.2.2.2:/fake_path /tmp/mnt.dead
> >
> > We'd like the mount command to specifically use "2.2.2.2:/fake_path"
> > since we know it is unreachable and outside your subnet.
> > We're hoping by mounting "2.2.2.2:/fake_path" you'll be able to
> > reproduce the same behavior that we're seeing.
> >
> > Also, if possible, a packet trace would be helpful:
> >     $ sudo tcpdump -s 0 -w /tmp/nfsv4.pcap port 2049
> >
> > On my Ubuntu VirtualMachine, I see this output:
> >     ubuntu@mikes-ubuntu-21-04:~$ time sudo mount -o vers=4 -vvv
> > 2.2.2.2:/fake_path /tmp/mnt.dead
> >     mount.nfs: timeout set for Wed Jun  9 05:12:15 2021
> >     mount.nfs: trying text-based options
> > 'vers=4,addr=2.2.2.2,clientaddr=10.162.132.231'
> >     mount.nfs: mount(2): Connection timed out
> >     mount.nfs: Connection timed out
> >     real  3m1.257s
> >     user  0m0.006s
> >     sys 0m0.007s
> >
> > Thanks, Mike
>
> It looks to me like you and Olga are seeing the same thing, a wait
> through SYN retries scaling up from initial RTO for the number of
> tcp_syn_retries.

Ben, I disagree. Mike and I are seeing different things. Mike is
seeing SYNs being sent. I argue that SYNs should not be sent. I agree
if SYNs are sent then that would cause a problem

> It's not disputed that mounts waiting on the transport layer will block
> other mounts.
>
> It might be able to be changed:  there's this torch:
> https://lore.kernel.org/linux-nfs/87378omld4.fsf@xxxxxxxxxxxxxxxxxxxxxxxx/

We already discussed that this is not a solution as the NFS layer has
to serialize the client creation attempts.

> ..or there may be another way we don't have to wait ..
>
> .. or tune tcp_syn_retries.. or RTO.. or something else (eBPF?).
>
> I think we're all strapped for time and problems like this usually get
> fixed by the folks feeling the most pain from them.

I think we are still not understanding what network setup that is
happening that leads to a client sending a SYN (which is incorrect) to
what is supposed to be an unreachable server instead of timing out
fast (because there shouldn't be an ARP entry).

Mike, can you show your arp cache info (arp -n) during your run?

>
> Ben
>



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux