Re: [PATCH review 3/8] dcache: Clearly separate the two directory rename cases in d_splice_alias

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Sat, 15 Aug 2015 13:25:55 -0500

Al Viro <viro@xxxxxxxxxxxxxxxxxx> writes:

> On Thu, Aug 13, 2015 at 11:31:47PM -0500, Eric W. Biederman wrote:
>
>> The problem is that as the lookup locking stands today grabbing the
>> s_vfs_rename_mutex must use mutex_trylock which can fail, so for reliability
>> reasons we need to avoid using the rename_mutex as much as possible.
>
> I really don't like it.  For one thing, you *still* are taking it - have to,
> really.  So this argument is moot anyway.  

Not moot.  We don't take s_vfs_rename_mutex when we are connecting a
disconnected dentry alias.  If d_splice_alias could sleep and take
s_vfs_rename_mutex then we could have a single path through the code.

Unfortunately the code can't sleep when taking s_vfs_rename_mutex so
attempting to take s_vfs_rename_mutex for both paths will introduce
unnecessary -ESTALE failures into d_splice_alias.

> ESTALE can happen here.  For
> another, I'm not convinced that this "we don't need no stinkin'
> extra locks for attaching a detached subtree" is correct.  E.g. what's
> to protect that IS_ROOT(new) from changing right under you?

You are quite correct that I missed that nothing protects the result of
IS_ROOT(new).   So my change does introduce a case where we don't
hold the appropriate inode mutexes when renaming a dentry and that
introduces races elsewhere in the code so it is not acceptable.

But it is true that we only take rename_lock and don't take any
additional mutex when connecting a disconnected dentry.  Aka "we don't
need no stinkin' extra locks".  We clearly can not take the
new->d_parent->d_inode->i_mutex when IS_ROOT(new) as that is
meaningless.  Further I do not see a point in taking s_vfs_rename_mutex
in that case.

Not for this round but if you can see any reason why our not taking
s_vfs_rename_mutex when connecting disconnected dentries is wrong
and we need to take it and risk -ESTALE.  I would love to know because I
would love to clean up that mess.

> I'm not saying that it wouldn't benefit from cleaner locking - it sure
> as hell would.  But it's in a really incestous relationship with a lot
> of other pieces, both in fs/dcache.c and elsewhere.  Let's not mix that
> into anything else - driveby cleanups in that place are very likely to
> cause serious trouble.

Fair enough. 

I do have to touch d_splice_alias and change the locking, so
unfortunately I have to do some of the nasty locking analysis anyway.

I am keeping the i_lock cleanup because it is trivial and removes the
need to figure out if there is any existing ordering between i_lock and
mount_lock, and if taking mount_lock could induce a deadlock.  That
audit would not be trivial.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html