On Fri, 2018-12-21 at 11:02 +0100, Frank Thommen wrote: > Dear all. > > @work we are struggling with NFS server timeouts and subsequentially > missing mounts on the clients: Sorry for the multiple posts on this but things often occur to me as I think about what's been written upon re-reading questions. > > [...] > Dec 21 10:12:20 XXX kernel: nfs: server SRV not responding, timed out > Dec 21 10:12:20 XXX automount[41879]: mount(nfs): nfs: mount failure > SRV:/a/b/c on /d/e/f > [...] > > The server timing out is a storage cluster with multiple IPs, served in > round-robin mode. Does autofs in cases of connectivity problems try to > resolve the server name multiple times - and then maybe get a "good" IP > - or is it "stuck" on the IP it get's when the initial mount request is > made? Another possibility comes to mind. If the problem is related purely to server selection for mount there was a problem with that in the past. It occurred specifically when the server name resolved to multiple addresses. The availability probe would be done to select a host for mounting but because there was a round-robin DNS in place the subsequent mount would end up using a different address, possibly of a host that was no longer responding. That problem was resolved by using IP address instead of host name for this case. Some people didn't much like that because the use of IP address made it more difficult to work out what was going on when looking at logs. The trick here is first checking that autofs is doing the availability probe for the map entry you're using (which it might not be) and then checking mount attempts are using IP address at mount time, not host name. So we would need to check the functionality of the autofs you are using if you think it's worth going further with this. Ian