Re: Regular deadlocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 2016-06-27 at 16:04 +0200, Cyril B. wrote:
> On 06/27/2016 02:26 AM, Ian Kent wrote:
> > How is autofs configured.
> > 
> > If --disable-mount-locking is not used then any mount can block all other
> > mounts, if it is used then there can be mtab corruption if still using a
> > text
> > based mtab.
> 
> I use --disable-mount-locking.
> 
> > I always use --disable-mount-locking and nowadays the mtab is usually a
> > symlink
> > into the proc file system so corruption isn't a problem.
> 
> /etc/mtab is actually not a symlink on my systems.
> 
> 
> Anyway, I have more details for you as the issue appeared today and I 
> could investigate some more. This is on a server that only mounts one 
> single NFS server (http12), so the multi-servers blocking issue is 
> irrelevant here.
> 
> A few minutes before the "deadlock" occurred, /nfs/http12 was unmounted 
> by autofs, I assume because it was idle. I have TIMEOUT=600. That 
> explains why the issue appears much more frequently on a server which is 
> way less busy (and usually in the middle of the night): the NFS server 
> needs to be idle enough to be unmounted.
> 
> However, I still had many /home/userX mounted (by autofs), which point 
> to /nfs/http12/userX. Shouldn't autofs not unmount /nfs/http12 when at 
> least one /home/userX is mounted? To be clear, here's an extract from my 
> /proc/mounts BEFORE the NFS server is unmounted by autofs:

This doesn't look like the full picture.
Where are the autofs file system mounts?

Mounted doesn't necessarily mean busy or in use but that also depends on the
mount hierarchy and how it has been constructed.

And symlinks are never be "busy", they can't be a pwd and they aren't opened as
a file.

> 
> http12:/ /nfs/http12 nfs4 
> rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t
> cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none,
> addr=2a00:42:1:20:1::1 
> 0
>   0

This looks like an nfs4 fsid 0 mount.

Is this one mounted by a program map similar to what you described earlier?
Do you then rely on the nfs cross device automounting to mount the user mounts?

Or is this one mounted external to autofs and used by autofs automounts?
Or are you using autofs to mount this one and some other map to mount the other
mounts?

> http12://user1 /home/user1 nfs4 
> rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t
> cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none,
> addr=2a00:42:1:20:
> 1::1 0 0
> http12://user2 /home/user2 nfs4 
> rw,nosuid,noatime,vers=4.0,rsize=1048576,wsize=1048576,namlen=255,soft,proto=t
> cp6,timeo=1000,retrans=2,sec=sys,clientaddr=2a00:42:1:50:1::1,local_lock=none,
> addr=2a00:42:1:2
> 0:1::1 0 0

But then there are these, that don't seem to be related to the mount above,
perhaps they reference the first mount. 

Can you describe again how this fits together?

At this point a full debug log would probably answer most of my questions.

> 
> 
> Also, I couldn't find any blocked mount process that would explain the 
> "deadlock". I had a 'ps aux|grep mount' done every 10 seconds:
> 
> Mon Jun 27 05:00:00 CEST 2016
> root        3437  0.0  0.0 218676  5500 ?        Ssl  Jun24   0:26 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> 
> Mon Jun 27 05:00:10 CEST 2016
> root        3437  0.0  0.0 218676  5500 ?        Ssl  Jun24   0:27 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618146  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618214  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618215  0.0  0.0      0     0 ?        Z    05:00   0:00 
> [umount] <defunct>
> root     2618224  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618227  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618230  0.0  0.0      0     0 ?        Z    05:00   0:00 
> [umount] <defunct>
> root     2618240  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618248  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618250  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618252  0.1  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> 
> Mon Jun 27 05:00:20 CEST 2016
> root        3437  0.0  0.0 218676  5500 ?        Ssl  Jun24   0:27 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618146  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618214  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618224  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618227  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618240  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618248  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618250  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618252  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618701  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> root     2618702  0.0  0.0 218676  2168 ?        S    05:00   0:00 
> /usr/sbin/automount --pid-file /var/run/autofs.pid
> 
> And it remained in that state afterwards. I don't know if the defunct 
> umount are suspicious, I guess not.
> 
> One last thing: a manual umount of /home/userY was done by a script at 
> 6:26 (/home/userY was NOT mounted though), and it remained blocked. I'm 
> not sure if it's a consequence of autofs being blocked or something else.
> 
--
To unsubscribe from this list: send the line "unsubscribe autofs" in



[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux