Re: When ghosting is enabled, failed mounts by AutoFS leave broken directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2012-09-26 at 16:39 -0300, Leonardo Chiquitto wrote:
> On Wed, Sep 26, 2012 at 6:15 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> > On Tue, 2012-09-25 at 18:42 -0300, Leonardo Chiquitto wrote:
> >> On Mon, Sep 24, 2012 at 11:13 PM, Ian Kent <raven@xxxxxxxxxx> wrote:
> >> > On Mon, 2012-09-24 at 17:14 -0300, Leonardo Chiquitto wrote:
> >> >> Hi,
> >> >>
> >> >> If you try to access an automounted directory and the mount operation fails,
> >> >> the attributes of the directory (owner, group, mode, access time) are lost.
> >> >> This happens only when AutoFS' browse mode (ghosting) is enabled.
> >> >>
> >> >> How to reproduce the problem:
> >> >>
> >> >> Make sure you have BROWSE_MODE="yes" in /etc/sysconfig/autofs.
> >> >>
> >> >> 1. Add a simple indirect mount map (it's required that the server does NOT
> >> >>    export the /notexported directory):
> >> >>
> >> >> # cat /etc/auto.master
> >> >> /nfs    /etc/auto.nfs
> >> >>
> >> >> # cat /etc/auto.nfs
> >> >> v2      server:/notexported
> >> >>
> >> >> 2. Restart AutoFS and try to access the directory:
> >> >>
> >> >> # cd /nfs/v2
> >> >> -bash: cd: /nfs/v2: No such file or directory
> >> >>
> >> >> # ls -la /nfs
> >> >> ls: cannot access /nfs/v2: No such file or directory
> >> >> total 4
> >> >> drwxr-xr-x  3 root root    0 Sep 24 21:43 .
> >> >> drwxr-xr-x 28 root root 4096 Sep 24 21:25 ..
> >> >> d?????????  ? ?    ?       ?            ? v2
> >> >>
> >> >> This problem is reproducible on recent upstream kernels such as 3.6.0-rc6.
> >> >> I found out that the last kernel version that does not exhibit the bug is
> >> >> 2.6.37. Between 2.6.37 and 2.6.38 there were quite some changes in
> >> >> AutoFS/VFS and my attempts to bisect the problem were frustrated by
> >> >> early crashes.
> >> >>
> >> >> Any ideas?
> >> >
> >> > Not really.
> >> >
> >> > It would be best to start with an full autofs debug log so we can see
> >> > what is happening within automount and work back from there. We'll
> >> > probably need to look at the dentry itself as well to see if we can get
> >> > any info about where it came from possibly, with a kernel dump.
> >> >
> >> > Ian
> >>
> >> Hi Ian,
> >>
> >> Here are the debug logs from automount and a snapshot of the dentry
> >> and inode of the "broken" directory (running 2.6.38).
> >>
> >> I tried to compare it with 2.6.37 (which works), but the structures
> >> changed a lot between these kernels.
> >
> > A great deal changed in 2.6.38 in the VFS.
> >
> > You really can't use 2.6.38 because there were a number of fixes that
> > went in after this and I think the VFS code up to around 3.0 or 3.1
> > could have had this problem anyway.
> 
> I'm using 3.6.0-rc6 here. I collected the data from 2.6.38 because this
> is the first kernel that shows the bug and I thought it would be easier
> to isolate the fix this way (smaller distance).
> 
> > What your looking for is kernels with fs/namei.c:follow_automount() with
> > this near the top of the function, which shouldn't cause this type of
> > problem:
> >
> >        /* We don't want to mount if someone's just doing a stat -
> >          * unless they're stat'ing a directory and appended a '/' to
> >          * the name.
> >          *
> >          * We do, however, want to mount if someone wants to open or
> >          * create a file of any type under the mountpoint, wants to
> >          * traverse through the mountpoint or wants to open the
> >          * mounted directory.  Also, autofs may mark negative dentries
> >          * as being automount points.  These will need the attentions
> >          * of the daemon to instantiate them before they can be used.
> >          */
> >         if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
> >                      LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
> >             path->dentry->d_inode)
> >                 return -EISDIR;
> >
> > But I'm seeing this with 3.4.9-2.fc16 which does have this in
> > follow_automount() so that's a puzzle.
> >
> >>
> >> ## starting up
> >>
> >> Sep 25 13:44:30 n1 automount[2337]: Starting automounter version
> >> 5.0.7, master map auto.master
> >> Sep 25 13:44:30 n1 automount[2337]: using kernel protocol version 5.02
> >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_master: reading
> >> master files auto.master
> >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init
> >> gathered global options: (null)
> >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected,
> >> passing -n to mount
> >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected,
> >> passing -n to mount
> >> Sep 25 13:44:30 n1 automount[2337]: lookup_read_master: lookup(file):
> >> read entry /nfs
> >> Sep 25 13:44:30 n1 automount[2337]: master_do_mount: mounting /nfs
> >> Sep 25 13:44:30 n1 automount[2337]: automount_path_to_fifo: fifo name
> >> /run/autofs.fifo-nfs
> >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_map: reading map
> >> file /etc/auto.nfs
> >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init
> >> gathered global options: (null)
> >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected,
> >> passing -n to mount
> >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected,
> >> passing -n to mount
> >> Sep 25 13:44:30 n1 automount[2337]: mounted indirect on /nfs with
> >> timeout 60, freq 15 seconds
> >> Sep 25 13:44:30 n1 automount[2337]: st_ready: st_ready(): state = 0 path /nfs
> >> Sep 25 13:44:30 n1 automount[2337]: ghosting enabled
> >>
> >> ## ls /nfs/v2
> >>
> >> Sep 25 13:44:44 n1 automount[2337]: handle_packet: type = 3
> >> Sep 25 13:44:44 n1 automount[2337]: handle_packet_missing_indirect:
> >> token 6, name v2, request pid 2352
> >> Sep 25 13:44:44 n1 automount[2337]: attempting to mount entry /nfs/v2
> >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): looking up v2
> >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): v2 ->
> >> server:/nfs
> >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): expanded
> >> entry: server:/nfs
> >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): gathered options:
> >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun):
> >> dequote("server:/nfs") -> server:/nfs
> >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): core of
> >> entry: options=, loc=server:/nfs
> >> Sep 25 13:44:44 n1 automount[2337]: sun_mount: parse(sun): mounting
> >> root /nfs, mountpoint v2, what server:/nfs, fstype nfs, options (null)
> >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): root=/nfs
> >> name=v2 what=server:/nfs, fstype=nfs, options=(null)
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host
> >> server(10.1.1.4) proto tcp version 0x30
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.000214
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000114
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 164 weight 0
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host
> >> server(10.1.1.4) proto udp version 0x30
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.001836
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000095
> >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 965 weight 0
> >> Sep 25 13:44:44 n1 automount[2337]: prune_host_list: selected subset
> >> of hosts that support NFS3 over TCP
> >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling
> >> mkdir_path /nfs/v2
> >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling
> >> mount -t nfs server:/nfs /nfs/v2
> >> Sep 25 13:44:44 n1 automount[2337]: spawn_mount: mtab link detected,
> >> passing -n to mount
> >> Sep 25 13:44:44 n1 checkproc: checkproc: Empty pid file
> >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd
> >> Sep 25 13:44:44 n1 start_daemon: start_daemon: Empty pid file
> >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd
> >> Sep 25 13:44:44 n1 rpc.statd[2369]: Version 1.2.6 starting
> >> Sep 25 13:44:44 n1 rpc.statd[2369]: Flags: TI-RPC
> >> Sep 25 13:44:44 n1 automount[2337]: >> Starting rpc.statd .....done
> >> Sep 25 13:44:44 n1 automount[2337]: >> mount.nfs: access denied by
> >> server while mounting server:/nfs
> >> Sep 25 13:44:44 n1 automount[2337]: mount(nfs): nfs: mount failure
> >> server:/nfs on /nfs/v2
> >> Sep 25 13:44:44 n1 automount[2337]: dev_ioctl_send_fail: token = 6
> >> Sep 25 13:44:44 n1 automount[2337]: failed to mount /nfs/v2
> >> Sep 25 13:44:54 n1 automount[2337]: st_expire: state 1 path /nfs
> >> Sep 25 13:44:54 n1 automount[2337]: expire_proc: exp_proc =
> >> 140484187907840 path /nfs
> >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: got thid
> >> 140484187907840 path /nfs stat 0
> >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: sigchld: exp
> >> 140484187907840 finished, switching from 2 to 1
> >> Sep 25 13:44:54 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs
> >> Sep 25 13:45:09 n1 automount[2337]: st_expire: state 1 path /nfs
> >> Sep 25 13:45:09 n1 automount[2337]: expire_proc: exp_proc =
> >> 140484187907840 path /nfs
> >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: got thid
> >> 140484187907840 path /nfs stat 0
> >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: sigchld: exp
> >> 140484187907840 finished, switching from 2 to 1
> >> Sep 25 13:45:09 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs
> >>
> >> ## ls -la /nfs
> >>
> >> Sep 25 13:45:12 n1 automount[2337]: handle_packet: type = 3
> >> Sep 25 13:45:12 n1 automount[2337]: handle_packet_missing_indirect:
> >> token 7, name v2, request pid 2397
> >> Sep 25 13:45:12 n1 automount[2337]: dev_ioctl_send_fail: token = 7
> >
> > This callback shouldn't happen.
> >
> >>
> >> dentry and inode (/nfs/v2) from a v2.6.38 kernel, after triggering
> >> the mount:
> >
> > This certainly looks like an autofs dentry.
> >
> >>
> >> struct dentry {
> >>   d_flags = 393216,
> >>   d_seq = {
> >>     sequence = 4
> >>   },
> >>   d_hash = {
> >>     next = 0x0,
> >>     pprev = 0xffffc900011f5680
> >>   },
> >
> > This looks a bit odd but I don't think it is the cause of the problem.
> >
> > Other than that these look ok.
> >
> > snip ...
> >
> > I setup a simple map, auto.test, with a single key "this" to check this
> > out and did an "strace -ff -o ls.trace /bin/ls --color=auto /test"
> > against the 3.4.9-2.fc16 I currently have installed.
> >
> > Ignoring all the cruft, near the bottom we see:
> > stat64("/test", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> > openat(AT_FDCWD, "/test", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
> > getdents64(3, /* 3 entries */, 32768)   = 72
> > lstat64("/test/this", 0x95aba40)        = -1 ENOENT (No such file or directory)
> 
> This happens only after there was an attempt to mount /test/this
> and the directory is already in the "incomplete" state, right?

But the directory is complete before the failed mount and should remain
so when using the "browse" option. You an see that by listing the
contents of the directory before the mount attempt.

> 
> > This is the problem I think, it should not attempt a mount and should
> > return 0.
> 
> My theory about the root cause is that when the mount fails, the
> kernel forgets to "reset" the state of that dentry. Makes any sense?

Not really. There's no need to reset anything, all the kernel does is do
a path walk to obtain the existing dentry and fill in the information
from it. But the lstat gets a fail on the walk and doesn't fill in the
information so user space doesn't get it.

The problem is the fact that there's a call back occurring which
shouldn't happen so maybe something (perhaps the daemon) has removed the
directory when it shouldn't have. That would cause a call back to happen
and when it sees a fail the remove isn't done because of the browse
option. Essentially an unmatched create/remove perhaps but with a prior
incorrect remove somewhere. I looked a bit but can't see how that would
happen either.

> 
> > I've only looked at the current upstream source so far and I can't see
> > why this is happening.
> >
> > I've built upstream 3.5.4 so I can look at the actual source of the
> > kernel I'm using to test.
> 
> Thanks for your help
> Leonardo


--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux