Re: NFS server (round-robin IP) times out: How does autofs behave? How can we fix that on the client side?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 23/12/18 04:30, Ian Kent wrote:
On Fri, 2018-12-21 at 11:02 +0100, Frank Thommen wrote:
Dear all.

@work we are struggling with NFS server timeouts and subsequentially
missing mounts on the clients:

Sorry for the multiple posts on this but things often occur to me
as I think about what's been written upon re-reading questions.


[...]
Dec 21 10:12:20 XXX kernel: nfs: server SRV not responding, timed out
Dec 21 10:12:20 XXX automount[41879]: mount(nfs): nfs: mount failure
SRV:/a/b/c on /d/e/f
[...]

The server timing out is a storage cluster with multiple IPs, served in
round-robin mode.  Does autofs in cases of connectivity problems try to
resolve the server name multiple times - and then maybe get a "good" IP
- or is it "stuck" on the IP it get's when the initial mount request is
made?

Another possibility comes to mind.

If the problem is related purely to server selection for mount there
was a problem with that in the past.

It occurred specifically when the server name resolved to multiple
addresses.

The availability probe would be done to select a host for mounting but
because there was a round-robin DNS in place the subsequent mount would
end up using a different address, possibly of a host that was no longer
responding.

That problem was resolved by using IP address instead of host name for
this case. Some people didn't much like that because the use of IP
address made it more difficult to work out what was going on when
looking at logs.

I normally don't like IP addresses in any configuration for various reasons, but in the current case they could effectively help us, as the `mount` timeout message would report the actual IP of the used head node and not the hostname of the storage cluster. So instead of

  mymount  our.storage.server:/export/share

we would have

  mymount  1.2.3.1,1.2.3.2,1.2.3.3:/export/share

so that `mount` would target individual IP numbers instead of global storage cluster names.


The trick here is first checking that autofs is doing the availability
probe for the map entry you're using (which it might not be) and then
checking mount attempts are using IP address at mount time, not host
name.

I'm not sure I understand this statement.


So we would need to check the functionality of the autofs you are using
if you think it's worth going further with this.

If you think that the replicated server setup should work, then we will try it. However due to bank holidays & co. we will not be able to implement this in the next two weeks (and hence I will not be able to report sucess or failure very soon).

frank



Ian







[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux