On Fri, 2018-12-21 at 11:02 +0100, Frank Thommen wrote: > Dear all. > > @work we are struggling with NFS server timeouts and subsequentially > missing mounts on the clients: > > [...] > Dec 21 10:12:20 XXX kernel: nfs: server SRV not responding, timed out > Dec 21 10:12:20 XXX automount[41879]: mount(nfs): nfs: mount failure > SRV:/a/b/c on /d/e/f > [...] > > The server timing out is a storage cluster with multiple IPs, served in > round-robin mode. Does autofs in cases of connectivity problems try to > resolve the server name multiple times - and then maybe get a "good" IP > - or is it "stuck" on the IP it get's when the initial mount request is > made? Quite apart from autofs I have seen (at least one case) problems with servers that have multiple IP addresses when multiple IP addresses are reachable from clients. IIRC (and it was a long time ago now) if an RPC sent on interface and a reply received on another things go bad. I don't think that happens very often but when it does it's a problem. I can't remember now but resolution involved not only ensuring the routing table is correct but adding static arp entries in specific places to ensure RPC request/reply actions are completed on the same interface. Ian