On Mon, 2016-06-27 at 16:04 +0200, Cyril B. wrote: > On 06/27/2016 02:26 AM, Ian Kent wrote: > > How is autofs configured. > > > > If --disable-mount-locking is not used then any mount can block all other > > mounts, if it is used then there can be mtab corruption if still using a > > text > > based mtab. > > I use --disable-mount-locking. > > > I always use --disable-mount-locking and nowadays the mtab is usually a > > symlink > > into the proc file system so corruption isn't a problem. > > /etc/mtab is actually not a symlink on my systems. > > > Anyway, I have more details for you as the issue appeared today and I > could investigate some more. This is on a server that only mounts one > single NFS server (http12), so the multi-servers blocking issue is > irrelevant here. > > A few minutes before the "deadlock" occurred, /nfs/http12 was unmounted > by autofs, I assume because it was idle. I have TIMEOUT=600. That > explains why the issue appears much more frequently on a server which is > way less busy (and usually in the middle of the night): the NFS server > needs to be idle enough to be unmounted. > > However, I still had many /home/userX mounted (by autofs), which point > to /nfs/http12/userX. Shouldn't autofs not unmount /nfs/http12 when at > least one /home/userX is mounted? To be clear, here's an extract from my > /proc/mounts BEFORE the NFS server is unmounted by autofs: This doesn't look like the full picture. Where are the autofs file system mounts? Mounted doesn't necessarily mean busy or in use but that also depends on the mount hierarchy and how it has been constructed. And symlinks are never be "busy", they can't be a pwd and they aren't opened as a file. > > http12:/ /nfs/http12 nfs4 > rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t > cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none, > addr=2a00:42:1:20:1::1 > 0 > 0 This looks like an nfs4 fsid 0 mount. Is this one mounted by a program map similar to what you described earlier? Do you then rely on the nfs cross device automounting to mount the user mounts? Or is this one mounted external to autofs and used by autofs automounts? Or are you using autofs to mount this one and some other map to mount the other mounts? > http12://user1 /home/user1 nfs4 > rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t > cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none, > addr=2a00:42:1:20: > 1::1 0 0 > http12://user2 /home/user2 nfs4 > rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t > cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none, > addr=2a00:42:1:2 > 0:1::1 0 0 But then there are these, that don't seem to be related to the mount above, perhaps they reference the first mount. Can you describe again how this fits together? At this point a full debug log would probably answer most of my questions. > > > Also, I couldn't find any blocked mount process that would explain the > "deadlock". I had a 'ps aux|grep mount' done every 10 seconds: > > Mon Jun 27 05:00:00 CEST 2016 > root 3437 0.0 0.0 218676 5500 ? Ssl Jun24 0:26 > /usr/sbin/automount --pid-file /var/run/autofs.pid > > Mon Jun 27 05:00:10 CEST 2016 > root 3437 0.0 0.0 218676 5500 ? Ssl Jun24 0:27 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618146 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618214 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618215 0.0 0.0 0 0 ? Z 05:00 0:00 > [umount] <defunct> > root 2618224 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618227 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618230 0.0 0.0 0 0 ? Z 05:00 0:00 > [umount] <defunct> > root 2618240 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618248 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618250 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618252 0.1 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > > Mon Jun 27 05:00:20 CEST 2016 > root 3437 0.0 0.0 218676 5500 ? Ssl Jun24 0:27 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618146 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618214 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618224 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618227 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618240 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618248 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618250 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618252 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618701 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > root 2618702 0.0 0.0 218676 2168 ? S 05:00 0:00 > /usr/sbin/automount --pid-file /var/run/autofs.pid > > And it remained in that state afterwards. I don't know if the defunct > umount are suspicious, I guess not. > > One last thing: a manual umount of /home/userY was done by a script at > 6:26 (/home/userY was NOT mounted though), and it remained blocked. I'm > not sure if it's a consequence of autofs being blocked or something else. > -- To unsubscribe from this list: send the line "unsubscribe autofs" in