On Sun, Jan 16, 2011 at 12:30 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote: > >> > AFAICS, it keeps your write-free objectives and gets much saner API. >> > Shout if you have problems with that... >> >> No that sounds good, I don't have a problem with it, although I don't >> exactly understand what you're getting at in the second paragraph. > > OK, I have a hopefully sane implementation in tip of #untested. > > There's a fun problem with what you do in do_lookup(), BTW. Look: > we enter do_lookup() with LOOKUP_RCU. We find dentry in dcache, > everything's beautiful. The sucker has ->d_revalidate(). We go > to need_revalidate. There we call do_revalidate(). It calls > d_revalidate(), which calls ->d_revalidate() and instead of spitting > into your eye and returning -ECHILD it happily returns 1. So > do d_revalidate() and then do_revalidate(), without any further > actions. do_revalidate() has returned our dentry, which is neither > NULL nor ERR_PTR(), so back in do_lookup() we go to done. > > There we set path->mnt and path->dentry and call __follow_mount(). > And damn, it *is* a mountpoint. So we > * do dput() on dentry we'd never grabbed a reference to > * grab a reference to a different dentry (and remain in happy > belief that we are in LOOKUP_RCU mode, and thus don't need to drop it) > * grab a reference to vfsmount (via lookup_mnt()). Ditto (and > I haven't checked if grabbing vfsmount_lock twice shared isn't a recipe > for a deadlocky race with something grabbing it exclusive between these > nested shared grabs). > * if we are really unlucky and that mountpoint is, in turn, > overmounted, we'll hit mntput(). While under vfsmount_lock. > > AFAICS, it's badly b0rken. And autofs really steps into that mess. Ah, I think you're right there, good catch. > As minimum, we'd need to split need_revalidate: and done: in RCU and non-RCU > variants. I'm about to fall down right now after an all-nighter (and then > some); if you put something together before I get up, please throw it > my way. It also has a problem with jumping back to need_lookup case there too. I think separating out those two cases is reasonable. > > Note that the problem exists both in mainline and in mainline+automount > patches; in the latter it's a bit nastier, but in principle the situation > is the same, so I'd rather see a fix for that in front of automount queue. Definitely agree. I'm on the road at the moment so would much appreciate if you can cut the patch, but I could suggest something along the lines of: if (nd->flags & LOOKUP_RCU) { unsigned seq; *inode = nd->inode; dentry = __d_lookup_rcu(parent, name, &seq, inode); if (!dentry) goto need_lookup_rcu; /* Memory barrier in read_seqcount_begin of child is enough */ if (__read_seqcount_retry(&parent->d_seq, nd->seq)) return -ECHILD; nd->seq = seq; if (dentry->d_flags & DCACHE_OP_REVALIDATE) { dentry = do_revalidate(dentry, nd); if (!dentry) goto need_lookup_rcu; if (IS_ERR(dentry)) goto fail; } path->mnt = mnt; path->dentry = dentry; __follow_mount_rcu(nd, path, inode); } else { dentry = __d_lookup(parent, name); if (!dentry) goto need_lookup; found: if (dentry->d_flags & DCACHE_OP_REVALIDATE) { dentry = do_revalidate(dentry, nd); if (!dentry) goto need_lookup; if (IS_ERR(dentry)) goto fail; } path->mnt = mnt; path->dentry = dentry; __follow_mount(path); *inode = path->dentry->d_inode; } return 0; need_lookup_rcu: if (nameidata_drop_rcu(nd)) return -ECHILD; need_lookup: ... -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html