On Wed, 2020-06-17 at 08:34 -0500, Patrick Goetz wrote: > Hi - > > We've been having strange performance problems for over a year on > high > powered workstations with fast (10Gb) networking. Programs that run > instantly on user's laptops would take 5-10 minutes to even launch > on > these dual Xeon workstations with 128GB of RAM. The OS is Ubuntu > 18.04 > > Last week I finally tracked this down to a comedy of errors, the > first > involving autofs. We only use NFS v.4.x, and thanks to a tip from > Ian > Kent, I had added: > > mount_nfs_default_protocol = 4 > > to /etc/autofs.conf. However, using this setting in /etc/auto.home: > > * octopus.biosci.utexas.edu:/home/& > > any attempt to access a non-existent directory in /home (e.g. > /home/syslog) would result in the automounter hanging while various > attempts to mount were executed. Changing the auto.home line to > > * -tcp,vers=4.2 octopus.biosci.utexas.edu:/home/& > > resolved this issue. I can understand the tcp option, but what > doesn't > make sense to me is the necessity to also specify vers=4.2 > If I remove vers=4.2 from the options list, the automounter starts > hanging again when asked to mount a non-existent directory. > > Maybe this is just an issue with mount.nfs? It's hard to say, possibly, that would need a debug log to check it out. > > This isn't relevant to autofs, but to satisfy the curious, the next > obvious question is why on earth were people attempting to mount > non-existent directories in /home, and the answer is (in part) some > negligent Debian package management. After sifting through strace > output I noticed entries like this in /etc/passwd: > > syslog:x:102:106::/home/syslog:/usr/sbin/nologin > > I guess they were thinking that since no one can log in as syslog, > it > doesn't matter what the home directory is set to; likely someone who > doesn't use or perhaps even know about autofs. On my Arch systems > these > entries would look like this: > > syslog:x:102:106::/:/usr/sbin/nologin Could be something to do with: # mount_wait - time to wait for a response from mount(8). # Setting this timeout can cause problems when # mount would otherwise wait for a server that # is temporarily unavailable, such as when it's # restarting. The default setting (-1) of waiting # for mount(8) usually results in a wait of around # 3 minutes. # #mount_wait = -1 But if this is set to some appropriate value it might introduce other problems because it's hard to track down those waiting sub-processes and kill them so they get left to die when they timeout. There are issues with the kernel RPC (since the kernel mostly handles the mount these days) stubbornly not returning failures quickly which has to be done to mitigate potential data corruption of remote file systems that also affect things like mounting. > > The final piece of the puzzle is why on earth do misconfigured > entries > in /etc/passwd come in to play, and the answer is > anaconda/miniconda. A > lot of computational biology tools are embedded in conda > environments, > and for some frankly inexplicable reason conda likes to troll > through > /etc/passwd searching for environments. This seems like an > anachronism > from the days when all real users were documented in /etc/passwd > rather > than via some LDAP, AD, NIS, or Kerberos directory server. > But if you use the wildcard map entry there's no way to know the entry being looked up doesn't exist so autofs is duty bound to try and mount the thing. If the problem is bad enough it might be worth adding some entries above the wildcard entry that bind mount to one or more directories to keep those applications happy. Ian