Re: [PATCH 19/18] Unexport do_add_mount() and add in follow_automount(), not ->d_automount()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 16, 2011 at 12:30 AM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
>> > AFAICS, it keeps your write-free objectives and gets much saner API.
>> > Shout if you have problems with that...
>>
>> No that sounds good, I don't have a problem with it, although I don't
>> exactly understand what you're getting at in the second paragraph.
>
> OK, I have a hopefully sane implementation in tip of #untested.
>
> There's a fun problem with what you do in do_lookup(), BTW.  Look:
> we enter do_lookup() with LOOKUP_RCU.  We find dentry in dcache,
> everything's beautiful.  The sucker has ->d_revalidate().  We go
> to need_revalidate.  There we call do_revalidate().  It calls
> d_revalidate(), which calls ->d_revalidate() and instead of spitting
> into your eye and returning -ECHILD it happily returns 1.  So
> do d_revalidate() and then do_revalidate(), without any further
> actions.  do_revalidate() has returned our dentry, which is neither
> NULL nor ERR_PTR(), so back in do_lookup() we go to done.
>
> There we set path->mnt and path->dentry and call __follow_mount().
> And damn, it *is* a mountpoint.  So we
>        * do dput() on dentry we'd never grabbed a reference to
>        * grab a reference to a different dentry (and remain in happy
> belief that we are in LOOKUP_RCU mode, and thus don't need to drop it)
>        * grab a reference to vfsmount (via lookup_mnt()).  Ditto (and
> I haven't checked if grabbing vfsmount_lock twice shared isn't a recipe
> for a deadlocky race with something grabbing it exclusive between these
> nested shared grabs).
>        * if we are really unlucky and that mountpoint is, in turn,
> overmounted, we'll hit mntput().  While under vfsmount_lock.
>
> AFAICS, it's badly b0rken.  And autofs really steps into that mess.

Ah, I think you're right there, good catch.


> As minimum, we'd need to split need_revalidate: and done: in RCU and non-RCU
> variants.  I'm about to fall down right now after an all-nighter (and then
> some); if you put something together before I get up, please throw it
> my way.

It also has a problem with jumping back to need_lookup case there too.
I think separating out those two cases is reasonable.

>
> Note that the problem exists both in mainline and in mainline+automount
> patches; in the latter it's a bit nastier, but in principle the situation
> is the same, so I'd rather see a fix for that in front of automount queue.

Definitely agree.

I'm on the road at the moment so would much appreciate if you can
cut the patch, but I could suggest something along the lines of:

	if (nd->flags & LOOKUP_RCU) {
		unsigned seq;

		*inode = nd->inode;
		dentry = __d_lookup_rcu(parent, name, &seq, inode);
		if (!dentry)
			goto need_lookup_rcu;
		/* Memory barrier in read_seqcount_begin of child is enough */
		if (__read_seqcount_retry(&parent->d_seq, nd->seq))
			return -ECHILD;

		nd->seq = seq;
		if (dentry->d_flags & DCACHE_OP_REVALIDATE) {
                        	dentry = do_revalidate(dentry, nd);
                                if (!dentry)
		                       goto need_lookup_rcu;
                                if (IS_ERR(dentry))
		                       goto fail;
                }
		path->mnt = mnt;
		path->dentry = dentry;
		__follow_mount_rcu(nd, path, inode);
	} else {
		dentry = __d_lookup(parent, name);
		if (!dentry)
			goto need_lookup;
found:
		if (dentry->d_flags & DCACHE_OP_REVALIDATE) {
                        	dentry = do_revalidate(dentry, nd);
                                if (!dentry)
		                       goto need_lookup;
                                if (IS_ERR(dentry))
		                       goto fail;
                }
		path->mnt = mnt;
		path->dentry = dentry;
		__follow_mount(path);
		*inode = path->dentry->d_inode;
	}
        return 0;

need_lookup_rcu:
        if (nameidata_drop_rcu(nd))
                return -ECHILD;
need_lookup:
       ...
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux