Re: When ghosting is enabled, failed mounts by AutoFS leave broken directories

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 27, 2012 at 1:35 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
> On Thu, 2012-09-27 at 09:04 +0800, Ian Kent wrote:
>> On Wed, 2012-09-26 at 16:39 -0300, Leonardo Chiquitto wrote:
>> > On Wed, Sep 26, 2012 at 6:15 AM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> > > On Tue, 2012-09-25 at 18:42 -0300, Leonardo Chiquitto wrote:
>> > >> On Mon, Sep 24, 2012 at 11:13 PM, Ian Kent <raven@xxxxxxxxxx> wrote:
>> > >> > On Mon, 2012-09-24 at 17:14 -0300, Leonardo Chiquitto wrote:
>> > >> >> Hi,
>> > >> >>
>> > >> >> If you try to access an automounted directory and the mount operation fails,
>> > >> >> the attributes of the directory (owner, group, mode, access time) are lost.
>> > >> >> This happens only when AutoFS' browse mode (ghosting) is enabled.
>> > >> >>
>> > >> >> How to reproduce the problem:
>> > >> >>
>> > >> >> Make sure you have BROWSE_MODE="yes" in /etc/sysconfig/autofs.
>> > >> >>
>> > >> >> 1. Add a simple indirect mount map (it's required that the server does NOT
>> > >> >>    export the /notexported directory):
>> > >> >>
>> > >> >> # cat /etc/auto.master
>> > >> >> /nfs    /etc/auto.nfs
>> > >> >>
>> > >> >> # cat /etc/auto.nfs
>> > >> >> v2      server:/notexported
>> > >> >>
>> > >> >> 2. Restart AutoFS and try to access the directory:
>> > >> >>
>> > >> >> # cd /nfs/v2
>> > >> >> -bash: cd: /nfs/v2: No such file or directory
>> > >> >>
>> > >> >> # ls -la /nfs
>> > >> >> ls: cannot access /nfs/v2: No such file or directory
>> > >> >> total 4
>> > >> >> drwxr-xr-x  3 root root    0 Sep 24 21:43 .
>> > >> >> drwxr-xr-x 28 root root 4096 Sep 24 21:25 ..
>> > >> >> d?????????  ? ?    ?       ?            ? v2
>> > >> >>
>> > >> >> This problem is reproducible on recent upstream kernels such as 3.6.0-rc6.
>> > >> >> I found out that the last kernel version that does not exhibit the bug is
>> > >> >> 2.6.37. Between 2.6.37 and 2.6.38 there were quite some changes in
>> > >> >> AutoFS/VFS and my attempts to bisect the problem were frustrated by
>> > >> >> early crashes.
>> > >> >>
>> > >> >> Any ideas?
>> > >> >
>> > >> > Not really.
>> > >> >
>> > >> > It would be best to start with an full autofs debug log so we can see
>> > >> > what is happening within automount and work back from there. We'll
>> > >> > probably need to look at the dentry itself as well to see if we can get
>> > >> > any info about where it came from possibly, with a kernel dump.
>> > >> >
>> > >> > Ian
>> > >>
>> > >> Hi Ian,
>> > >>
>> > >> Here are the debug logs from automount and a snapshot of the dentry
>> > >> and inode of the "broken" directory (running 2.6.38).
>> > >>
>> > >> I tried to compare it with 2.6.37 (which works), but the structures
>> > >> changed a lot between these kernels.
>> > >
>> > > A great deal changed in 2.6.38 in the VFS.
>> > >
>> > > You really can't use 2.6.38 because there were a number of fixes that
>> > > went in after this and I think the VFS code up to around 3.0 or 3.1
>> > > could have had this problem anyway.
>> >
>> > I'm using 3.6.0-rc6 here. I collected the data from 2.6.38 because this
>> > is the first kernel that shows the bug and I thought it would be easier
>> > to isolate the fix this way (smaller distance).
>> >
>> > > What your looking for is kernels with fs/namei.c:follow_automount() with
>> > > this near the top of the function, which shouldn't cause this type of
>> > > problem:
>> > >
>> > >        /* We don't want to mount if someone's just doing a stat -
>> > >          * unless they're stat'ing a directory and appended a '/' to
>> > >          * the name.
>> > >          *
>> > >          * We do, however, want to mount if someone wants to open or
>> > >          * create a file of any type under the mountpoint, wants to
>> > >          * traverse through the mountpoint or wants to open the
>> > >          * mounted directory.  Also, autofs may mark negative dentries
>> > >          * as being automount points.  These will need the attentions
>> > >          * of the daemon to instantiate them before they can be used.
>> > >          */
>> > >         if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY |
>> > >                      LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) &&
>> > >             path->dentry->d_inode)
>> > >                 return -EISDIR;
>> > >
>> > > But I'm seeing this with 3.4.9-2.fc16 which does have this in
>> > > follow_automount() so that's a puzzle.
>> > >
>> > >>
>> > >> ## starting up
>> > >>
>> > >> Sep 25 13:44:30 n1 automount[2337]: Starting automounter version
>> > >> 5.0.7, master map auto.master
>> > >> Sep 25 13:44:30 n1 automount[2337]: using kernel protocol version 5.02
>> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_master: reading
>> > >> master files auto.master
>> > >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init
>> > >> gathered global options: (null)
>> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected,
>> > >> passing -n to mount
>> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected,
>> > >> passing -n to mount
>> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_read_master: lookup(file):
>> > >> read entry /nfs
>> > >> Sep 25 13:44:30 n1 automount[2337]: master_do_mount: mounting /nfs
>> > >> Sep 25 13:44:30 n1 automount[2337]: automount_path_to_fifo: fifo name
>> > >> /run/autofs.fifo-nfs
>> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_map: reading map
>> > >> file /etc/auto.nfs
>> > >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init
>> > >> gathered global options: (null)
>> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected,
>> > >> passing -n to mount
>> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected,
>> > >> passing -n to mount
>> > >> Sep 25 13:44:30 n1 automount[2337]: mounted indirect on /nfs with
>> > >> timeout 60, freq 15 seconds
>> > >> Sep 25 13:44:30 n1 automount[2337]: st_ready: st_ready(): state = 0 path /nfs
>> > >> Sep 25 13:44:30 n1 automount[2337]: ghosting enabled
>> > >>
>> > >> ## ls /nfs/v2
>> > >>
>> > >> Sep 25 13:44:44 n1 automount[2337]: handle_packet: type = 3
>> > >> Sep 25 13:44:44 n1 automount[2337]: handle_packet_missing_indirect:
>> > >> token 6, name v2, request pid 2352
>> > >> Sep 25 13:44:44 n1 automount[2337]: attempting to mount entry /nfs/v2
>> > >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): looking up v2
>> > >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): v2 ->
>> > >> server:/nfs
>> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): expanded
>> > >> entry: server:/nfs
>> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): gathered options:
>> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun):
>> > >> dequote("server:/nfs") -> server:/nfs
>> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): core of
>> > >> entry: options=, loc=server:/nfs
>> > >> Sep 25 13:44:44 n1 automount[2337]: sun_mount: parse(sun): mounting
>> > >> root /nfs, mountpoint v2, what server:/nfs, fstype nfs, options (null)
>> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): root=/nfs
>> > >> name=v2 what=server:/nfs, fstype=nfs, options=(null)
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host
>> > >> server(10.1.1.4) proto tcp version 0x30
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.000214
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000114
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 164 weight 0
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host
>> > >> server(10.1.1.4) proto udp version 0x30
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.001836
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000095
>> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 965 weight 0
>> > >> Sep 25 13:44:44 n1 automount[2337]: prune_host_list: selected subset
>> > >> of hosts that support NFS3 over TCP
>> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling
>> > >> mkdir_path /nfs/v2
>> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling
>> > >> mount -t nfs server:/nfs /nfs/v2
>> > >> Sep 25 13:44:44 n1 automount[2337]: spawn_mount: mtab link detected,
>> > >> passing -n to mount
>> > >> Sep 25 13:44:44 n1 checkproc: checkproc: Empty pid file
>> > >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd
>> > >> Sep 25 13:44:44 n1 start_daemon: start_daemon: Empty pid file
>> > >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd
>> > >> Sep 25 13:44:44 n1 rpc.statd[2369]: Version 1.2.6 starting
>> > >> Sep 25 13:44:44 n1 rpc.statd[2369]: Flags: TI-RPC
>> > >> Sep 25 13:44:44 n1 automount[2337]: >> Starting rpc.statd .....done
>> > >> Sep 25 13:44:44 n1 automount[2337]: >> mount.nfs: access denied by
>> > >> server while mounting server:/nfs
>> > >> Sep 25 13:44:44 n1 automount[2337]: mount(nfs): nfs: mount failure
>> > >> server:/nfs on /nfs/v2
>> > >> Sep 25 13:44:44 n1 automount[2337]: dev_ioctl_send_fail: token = 6
>> > >> Sep 25 13:44:44 n1 automount[2337]: failed to mount /nfs/v2
>> > >> Sep 25 13:44:54 n1 automount[2337]: st_expire: state 1 path /nfs
>> > >> Sep 25 13:44:54 n1 automount[2337]: expire_proc: exp_proc =
>> > >> 140484187907840 path /nfs
>> > >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: got thid
>> > >> 140484187907840 path /nfs stat 0
>> > >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: sigchld: exp
>> > >> 140484187907840 finished, switching from 2 to 1
>> > >> Sep 25 13:44:54 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs
>> > >> Sep 25 13:45:09 n1 automount[2337]: st_expire: state 1 path /nfs
>> > >> Sep 25 13:45:09 n1 automount[2337]: expire_proc: exp_proc =
>> > >> 140484187907840 path /nfs
>> > >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: got thid
>> > >> 140484187907840 path /nfs stat 0
>> > >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: sigchld: exp
>> > >> 140484187907840 finished, switching from 2 to 1
>> > >> Sep 25 13:45:09 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs
>> > >>
>> > >> ## ls -la /nfs
>> > >>
>> > >> Sep 25 13:45:12 n1 automount[2337]: handle_packet: type = 3
>> > >> Sep 25 13:45:12 n1 automount[2337]: handle_packet_missing_indirect:
>> > >> token 7, name v2, request pid 2397
>> > >> Sep 25 13:45:12 n1 automount[2337]: dev_ioctl_send_fail: token = 7
>> > >
>> > > This callback shouldn't happen.
>> > >
>> > >>
>> > >> dentry and inode (/nfs/v2) from a v2.6.38 kernel, after triggering
>> > >> the mount:
>> > >
>> > > This certainly looks like an autofs dentry.
>> > >
>> > >>
>> > >> struct dentry {
>> > >>   d_flags = 393216,
>> > >>   d_seq = {
>> > >>     sequence = 4
>> > >>   },
>> > >>   d_hash = {
>> > >>     next = 0x0,
>> > >>     pprev = 0xffffc900011f5680
>> > >>   },
>> > >
>> > > This looks a bit odd but I don't think it is the cause of the problem.
>> > >
>> > > Other than that these look ok.
>> > >
>> > > snip ...
>> > >
>> > > I setup a simple map, auto.test, with a single key "this" to check this
>> > > out and did an "strace -ff -o ls.trace /bin/ls --color=auto /test"
>> > > against the 3.4.9-2.fc16 I currently have installed.
>> > >
>> > > Ignoring all the cruft, near the bottom we see:
>> > > stat64("/test", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
>> > > openat(AT_FDCWD, "/test", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3
>> > > getdents64(3, /* 3 entries */, 32768)   = 72
>> > > lstat64("/test/this", 0x95aba40)        = -1 ENOENT (No such file or directory)
>> >
>> > This happens only after there was an attempt to mount /test/this
>> > and the directory is already in the "incomplete" state, right?
>>
>> But the directory is complete before the failed mount and should remain
>> so when using the "browse" option. You an see that by listing the
>> contents of the directory before the mount attempt.
>>
>> >
>> > > This is the problem I think, it should not attempt a mount and should
>> > > return 0.
>> >
>> > My theory about the root cause is that when the mount fails, the
>> > kernel forgets to "reset" the state of that dentry. Makes any sense?
>>
>> Not really. There's no need to reset anything, all the kernel does is do
>> a path walk to obtain the existing dentry and fill in the information
>> from it. But the lstat gets a fail on the walk and doesn't fill in the
>> information so user space doesn't get it.
>>
>> The problem is the fact that there's a call back occurring which
>> shouldn't happen so maybe something (perhaps the daemon) has removed the
>> directory when it shouldn't have. That would cause a call back to happen
>> and when it sees a fail the remove isn't done because of the browse
>> option. Essentially an unmatched create/remove perhaps but with a prior
>> incorrect remove somewhere. I looked a bit but can't see how that would
>> happen either.
>
> I can confirm that there is no directory removal and re-create going on.
>
> Checking for that lead me to discover this mistake which fixes the
> symptom we're seeing:
>
> autofs4 - fix reset pending flag on mount fail
>
> From: Ian Kent <raven@xxxxxxxxxx>
>
> In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING
> mount pending flag is not cleared.
>
> One effect of this is when using the "browse" option, directory entry
> attributes show up with all "?"s due to the incorrect callback and
> subsequent failure return (when in fact no callback should be made).

Reported-and-tested-by: Leonardo Chiquitto <lchiquitto@xxxxxxx>

Ian,

I just tested the patch here and confirmed it works perfectly.

Thanks!
Leonardo

>  fs/autofs4/root.c |    6 ++++--
>  1 files changed, 4 insertions(+), 2 deletions(-)
>
>
> diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c
> index 75e5f1c..8c4292f 100644
> --- a/fs/autofs4/root.c
> +++ b/fs/autofs4/root.c
> @@ -392,10 +392,12 @@ static struct vfsmount *autofs4_d_automount(struct path *path)
>                 ino->flags |= AUTOFS_INF_PENDING;
>                 spin_unlock(&sbi->fs_lock);
>                 status = autofs4_mount_wait(dentry);
> -               if (status)
> -                       return ERR_PTR(status);
>                 spin_lock(&sbi->fs_lock);
>                 ino->flags &= ~AUTOFS_INF_PENDING;
> +               if (status) {
> +                       spin_unlock(&sbi->fs_lock);
> +                       return ERR_PTR(status);
> +               }
>         }
>  done:
>         if (!(ino->flags & AUTOFS_INF_EXPIRING)) {
>
>
--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux Ext4]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux