On Thu, Sep 27, 2012 at 1:35 AM, Ian Kent <raven@xxxxxxxxxx> wrote: > On Thu, 2012-09-27 at 09:04 +0800, Ian Kent wrote: >> On Wed, 2012-09-26 at 16:39 -0300, Leonardo Chiquitto wrote: >> > On Wed, Sep 26, 2012 at 6:15 AM, Ian Kent <raven@xxxxxxxxxx> wrote: >> > > On Tue, 2012-09-25 at 18:42 -0300, Leonardo Chiquitto wrote: >> > >> On Mon, Sep 24, 2012 at 11:13 PM, Ian Kent <raven@xxxxxxxxxx> wrote: >> > >> > On Mon, 2012-09-24 at 17:14 -0300, Leonardo Chiquitto wrote: >> > >> >> Hi, >> > >> >> >> > >> >> If you try to access an automounted directory and the mount operation fails, >> > >> >> the attributes of the directory (owner, group, mode, access time) are lost. >> > >> >> This happens only when AutoFS' browse mode (ghosting) is enabled. >> > >> >> >> > >> >> How to reproduce the problem: >> > >> >> >> > >> >> Make sure you have BROWSE_MODE="yes" in /etc/sysconfig/autofs. >> > >> >> >> > >> >> 1. Add a simple indirect mount map (it's required that the server does NOT >> > >> >> export the /notexported directory): >> > >> >> >> > >> >> # cat /etc/auto.master >> > >> >> /nfs /etc/auto.nfs >> > >> >> >> > >> >> # cat /etc/auto.nfs >> > >> >> v2 server:/notexported >> > >> >> >> > >> >> 2. Restart AutoFS and try to access the directory: >> > >> >> >> > >> >> # cd /nfs/v2 >> > >> >> -bash: cd: /nfs/v2: No such file or directory >> > >> >> >> > >> >> # ls -la /nfs >> > >> >> ls: cannot access /nfs/v2: No such file or directory >> > >> >> total 4 >> > >> >> drwxr-xr-x 3 root root 0 Sep 24 21:43 . >> > >> >> drwxr-xr-x 28 root root 4096 Sep 24 21:25 .. >> > >> >> d????????? ? ? ? ? ? v2 >> > >> >> >> > >> >> This problem is reproducible on recent upstream kernels such as 3.6.0-rc6. >> > >> >> I found out that the last kernel version that does not exhibit the bug is >> > >> >> 2.6.37. Between 2.6.37 and 2.6.38 there were quite some changes in >> > >> >> AutoFS/VFS and my attempts to bisect the problem were frustrated by >> > >> >> early crashes. >> > >> >> >> > >> >> Any ideas? >> > >> > >> > >> > Not really. >> > >> > >> > >> > It would be best to start with an full autofs debug log so we can see >> > >> > what is happening within automount and work back from there. We'll >> > >> > probably need to look at the dentry itself as well to see if we can get >> > >> > any info about where it came from possibly, with a kernel dump. >> > >> > >> > >> > Ian >> > >> >> > >> Hi Ian, >> > >> >> > >> Here are the debug logs from automount and a snapshot of the dentry >> > >> and inode of the "broken" directory (running 2.6.38). >> > >> >> > >> I tried to compare it with 2.6.37 (which works), but the structures >> > >> changed a lot between these kernels. >> > > >> > > A great deal changed in 2.6.38 in the VFS. >> > > >> > > You really can't use 2.6.38 because there were a number of fixes that >> > > went in after this and I think the VFS code up to around 3.0 or 3.1 >> > > could have had this problem anyway. >> > >> > I'm using 3.6.0-rc6 here. I collected the data from 2.6.38 because this >> > is the first kernel that shows the bug and I thought it would be easier >> > to isolate the fix this way (smaller distance). >> > >> > > What your looking for is kernels with fs/namei.c:follow_automount() with >> > > this near the top of the function, which shouldn't cause this type of >> > > problem: >> > > >> > > /* We don't want to mount if someone's just doing a stat - >> > > * unless they're stat'ing a directory and appended a '/' to >> > > * the name. >> > > * >> > > * We do, however, want to mount if someone wants to open or >> > > * create a file of any type under the mountpoint, wants to >> > > * traverse through the mountpoint or wants to open the >> > > * mounted directory. Also, autofs may mark negative dentries >> > > * as being automount points. These will need the attentions >> > > * of the daemon to instantiate them before they can be used. >> > > */ >> > > if (!(flags & (LOOKUP_PARENT | LOOKUP_DIRECTORY | >> > > LOOKUP_OPEN | LOOKUP_CREATE | LOOKUP_AUTOMOUNT)) && >> > > path->dentry->d_inode) >> > > return -EISDIR; >> > > >> > > But I'm seeing this with 3.4.9-2.fc16 which does have this in >> > > follow_automount() so that's a puzzle. >> > > >> > >> >> > >> ## starting up >> > >> >> > >> Sep 25 13:44:30 n1 automount[2337]: Starting automounter version >> > >> 5.0.7, master map auto.master >> > >> Sep 25 13:44:30 n1 automount[2337]: using kernel protocol version 5.02 >> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_master: reading >> > >> master files auto.master >> > >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init >> > >> gathered global options: (null) >> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected, >> > >> passing -n to mount >> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected, >> > >> passing -n to mount >> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_read_master: lookup(file): >> > >> read entry /nfs >> > >> Sep 25 13:44:30 n1 automount[2337]: master_do_mount: mounting /nfs >> > >> Sep 25 13:44:30 n1 automount[2337]: automount_path_to_fifo: fifo name >> > >> /run/autofs.fifo-nfs >> > >> Sep 25 13:44:30 n1 automount[2337]: lookup_nss_read_map: reading map >> > >> file /etc/auto.nfs >> > >> Sep 25 13:44:30 n1 automount[2337]: parse_init: parse(sun): init >> > >> gathered global options: (null) >> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_mount: mtab link detected, >> > >> passing -n to mount >> > >> Sep 25 13:44:30 n1 automount[2337]: spawn_umount: mtab link detected, >> > >> passing -n to mount >> > >> Sep 25 13:44:30 n1 automount[2337]: mounted indirect on /nfs with >> > >> timeout 60, freq 15 seconds >> > >> Sep 25 13:44:30 n1 automount[2337]: st_ready: st_ready(): state = 0 path /nfs >> > >> Sep 25 13:44:30 n1 automount[2337]: ghosting enabled >> > >> >> > >> ## ls /nfs/v2 >> > >> >> > >> Sep 25 13:44:44 n1 automount[2337]: handle_packet: type = 3 >> > >> Sep 25 13:44:44 n1 automount[2337]: handle_packet_missing_indirect: >> > >> token 6, name v2, request pid 2352 >> > >> Sep 25 13:44:44 n1 automount[2337]: attempting to mount entry /nfs/v2 >> > >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): looking up v2 >> > >> Sep 25 13:44:44 n1 automount[2337]: lookup_mount: lookup(file): v2 -> >> > >> server:/nfs >> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): expanded >> > >> entry: server:/nfs >> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): gathered options: >> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): >> > >> dequote("server:/nfs") -> server:/nfs >> > >> Sep 25 13:44:44 n1 automount[2337]: parse_mount: parse(sun): core of >> > >> entry: options=, loc=server:/nfs >> > >> Sep 25 13:44:44 n1 automount[2337]: sun_mount: parse(sun): mounting >> > >> root /nfs, mountpoint v2, what server:/nfs, fstype nfs, options (null) >> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): root=/nfs >> > >> name=v2 what=server:/nfs, fstype=nfs, options=(null) >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host >> > >> server(10.1.1.4) proto tcp version 0x30 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.000214 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000114 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 164 weight 0 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: called with host >> > >> server(10.1.1.4) proto udp version 0x30 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v3 rpc ping time: 0.001836 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: nfs v2 rpc ping time: 0.000095 >> > >> Sep 25 13:44:44 n1 automount[2337]: get_nfs_info: host server cost 965 weight 0 >> > >> Sep 25 13:44:44 n1 automount[2337]: prune_host_list: selected subset >> > >> of hosts that support NFS3 over TCP >> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling >> > >> mkdir_path /nfs/v2 >> > >> Sep 25 13:44:44 n1 automount[2337]: mount_mount: mount(nfs): calling >> > >> mount -t nfs server:/nfs /nfs/v2 >> > >> Sep 25 13:44:44 n1 automount[2337]: spawn_mount: mtab link detected, >> > >> passing -n to mount >> > >> Sep 25 13:44:44 n1 checkproc: checkproc: Empty pid file >> > >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd >> > >> Sep 25 13:44:44 n1 start_daemon: start_daemon: Empty pid file >> > >> /var/run/rpc.statd.pid for /usr/sbin/rpc.statd >> > >> Sep 25 13:44:44 n1 rpc.statd[2369]: Version 1.2.6 starting >> > >> Sep 25 13:44:44 n1 rpc.statd[2369]: Flags: TI-RPC >> > >> Sep 25 13:44:44 n1 automount[2337]: >> Starting rpc.statd .....done >> > >> Sep 25 13:44:44 n1 automount[2337]: >> mount.nfs: access denied by >> > >> server while mounting server:/nfs >> > >> Sep 25 13:44:44 n1 automount[2337]: mount(nfs): nfs: mount failure >> > >> server:/nfs on /nfs/v2 >> > >> Sep 25 13:44:44 n1 automount[2337]: dev_ioctl_send_fail: token = 6 >> > >> Sep 25 13:44:44 n1 automount[2337]: failed to mount /nfs/v2 >> > >> Sep 25 13:44:54 n1 automount[2337]: st_expire: state 1 path /nfs >> > >> Sep 25 13:44:54 n1 automount[2337]: expire_proc: exp_proc = >> > >> 140484187907840 path /nfs >> > >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: got thid >> > >> 140484187907840 path /nfs stat 0 >> > >> Sep 25 13:44:54 n1 automount[2337]: expire_cleanup: sigchld: exp >> > >> 140484187907840 finished, switching from 2 to 1 >> > >> Sep 25 13:44:54 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs >> > >> Sep 25 13:45:09 n1 automount[2337]: st_expire: state 1 path /nfs >> > >> Sep 25 13:45:09 n1 automount[2337]: expire_proc: exp_proc = >> > >> 140484187907840 path /nfs >> > >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: got thid >> > >> 140484187907840 path /nfs stat 0 >> > >> Sep 25 13:45:09 n1 automount[2337]: expire_cleanup: sigchld: exp >> > >> 140484187907840 finished, switching from 2 to 1 >> > >> Sep 25 13:45:09 n1 automount[2337]: st_ready: st_ready(): state = 2 path /nfs >> > >> >> > >> ## ls -la /nfs >> > >> >> > >> Sep 25 13:45:12 n1 automount[2337]: handle_packet: type = 3 >> > >> Sep 25 13:45:12 n1 automount[2337]: handle_packet_missing_indirect: >> > >> token 7, name v2, request pid 2397 >> > >> Sep 25 13:45:12 n1 automount[2337]: dev_ioctl_send_fail: token = 7 >> > > >> > > This callback shouldn't happen. >> > > >> > >> >> > >> dentry and inode (/nfs/v2) from a v2.6.38 kernel, after triggering >> > >> the mount: >> > > >> > > This certainly looks like an autofs dentry. >> > > >> > >> >> > >> struct dentry { >> > >> d_flags = 393216, >> > >> d_seq = { >> > >> sequence = 4 >> > >> }, >> > >> d_hash = { >> > >> next = 0x0, >> > >> pprev = 0xffffc900011f5680 >> > >> }, >> > > >> > > This looks a bit odd but I don't think it is the cause of the problem. >> > > >> > > Other than that these look ok. >> > > >> > > snip ... >> > > >> > > I setup a simple map, auto.test, with a single key "this" to check this >> > > out and did an "strace -ff -o ls.trace /bin/ls --color=auto /test" >> > > against the 3.4.9-2.fc16 I currently have installed. >> > > >> > > Ignoring all the cruft, near the bottom we see: >> > > stat64("/test", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0 >> > > openat(AT_FDCWD, "/test", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY|O_CLOEXEC) = 3 >> > > getdents64(3, /* 3 entries */, 32768) = 72 >> > > lstat64("/test/this", 0x95aba40) = -1 ENOENT (No such file or directory) >> > >> > This happens only after there was an attempt to mount /test/this >> > and the directory is already in the "incomplete" state, right? >> >> But the directory is complete before the failed mount and should remain >> so when using the "browse" option. You an see that by listing the >> contents of the directory before the mount attempt. >> >> > >> > > This is the problem I think, it should not attempt a mount and should >> > > return 0. >> > >> > My theory about the root cause is that when the mount fails, the >> > kernel forgets to "reset" the state of that dentry. Makes any sense? >> >> Not really. There's no need to reset anything, all the kernel does is do >> a path walk to obtain the existing dentry and fill in the information >> from it. But the lstat gets a fail on the walk and doesn't fill in the >> information so user space doesn't get it. >> >> The problem is the fact that there's a call back occurring which >> shouldn't happen so maybe something (perhaps the daemon) has removed the >> directory when it shouldn't have. That would cause a call back to happen >> and when it sees a fail the remove isn't done because of the browse >> option. Essentially an unmatched create/remove perhaps but with a prior >> incorrect remove somewhere. I looked a bit but can't see how that would >> happen either. > > I can confirm that there is no directory removal and re-create going on. > > Checking for that lead me to discover this mistake which fixes the > symptom we're seeing: > > autofs4 - fix reset pending flag on mount fail > > From: Ian Kent <raven@xxxxxxxxxx> > > In autofs4_d_automount(), if a mount fail occurs the AUTOFS_INF_PENDING > mount pending flag is not cleared. > > One effect of this is when using the "browse" option, directory entry > attributes show up with all "?"s due to the incorrect callback and > subsequent failure return (when in fact no callback should be made). Reported-and-tested-by: Leonardo Chiquitto <lchiquitto@xxxxxxx> Ian, I just tested the patch here and confirmed it works perfectly. Thanks! Leonardo > fs/autofs4/root.c | 6 ++++-- > 1 files changed, 4 insertions(+), 2 deletions(-) > > > diff --git a/fs/autofs4/root.c b/fs/autofs4/root.c > index 75e5f1c..8c4292f 100644 > --- a/fs/autofs4/root.c > +++ b/fs/autofs4/root.c > @@ -392,10 +392,12 @@ static struct vfsmount *autofs4_d_automount(struct path *path) > ino->flags |= AUTOFS_INF_PENDING; > spin_unlock(&sbi->fs_lock); > status = autofs4_mount_wait(dentry); > - if (status) > - return ERR_PTR(status); > spin_lock(&sbi->fs_lock); > ino->flags &= ~AUTOFS_INF_PENDING; > + if (status) { > + spin_unlock(&sbi->fs_lock); > + return ERR_PTR(status); > + } > } > done: > if (!(ino->flags & AUTOFS_INF_EXPIRING)) { > > -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html