On Fri, 2018-12-21 at 16:15 +0100, Frank Thommen wrote: > On 12/21/18 11:02 AM, Frank Thommen wrote: > > Dear all. > > > > @work we are struggling with NFS server timeouts and subsequentially > > missing mounts on the clients: > > > > [...] > > Dec 21 10:12:20 XXX kernel: nfs: server SRV not responding, timed out > > Dec 21 10:12:20 XXX automount[41879]: mount(nfs): nfs: mount failure > > SRV:/a/b/c on /d/e/f > > [...] > > > > The server timing out is a storage cluster with multiple IPs, served in > > round-robin mode. Does autofs in cases of connectivity problems try to > > resolve the server name multiple times - and then maybe get a "good" IP > > - or is it "stuck" on the IP it get's when the initial mount request is > > made? > > > > If autofs does not re-resolve server names: Is there a way to provide > > autofs with multiple names/ips which autofs tries all to find a possibly > > working head node? How would this have to be configured? > > I found the "Replicated Server" feature. How does autofs use the > different entries? Does it make a "round-robin" on it's own? And how > does autofs behave, if one of the multiple entries is not reachable or > the NFS server times out? Because autofs has no control over what happens to an NFS mount once it is mounted it can't do any "fail-over" of active NFS mounts. This feature would need to be implemented in NFS itself not autofs. All autofs can do is, when given a list of replicated servers upon which it can find the same file system, is to try each of them at initial mount time until it gets one that works. I would have to look at the code but I think I do the same thing when an NFS server name resolves to multiple addresses via DNS. Note that even if the kernel NFS folks implemented fail-over it would likely be for "read-only" replicated file systems only due to the problems of cache-coherency between servers for the writeable case (read as file system corruption risk). autofs does this because it isn't implemented in the NFS client and it doesn't check and enforce the "read-only" requirement as it can get away with that because it does it only at mount time. Ian