Re: [PATCH 08/11] vfs: Merge check_submounts_and_drop and d_invalidate

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:

> On Tue, Feb 25, 2014 at 02:03:36PM -0800, Eric W. Biederman wrote:
>> "J. Bruce Fields" <bfields@xxxxxxxxxxxx> writes:
>> 
>> > On Mon, Feb 24, 2014 at 04:01:29PM -0800, Eric W. Biederman wrote:
>> >> Miklos Szeredi <miklos@xxxxxxxxxx> writes:
>> >> 
>> >> >
>> >> > You can optimize this by including the negative check within the above d_locked
>> >> > region and calling __d_drop() instead.
>> >> 
>> >> For this patch just moving the code and not changing it is the corret
>> >> thing to do because it helps with review and understanding the code.
>> >> 
>> >> There are two ways I could see going with optimizing the preamble.
>> >> Simply dropping the d_lock from around the d_unhashed test as a pointer
>> >> dereference should be atomic, and the test is racy against
>> >> d_materialise_unique.
>> >
>> > Could you explain?  What's the race, and what are the consequences?
>
> Actually I was just confused as to whether the above was "is racy" was
> claiming the existance of some bug.
>
> I believe I should have read the above as more like "the test is already
> racy against d_materialise_unique, but it's a harmless race, and
> dropping the d_lock wouldn't make it any worse".
>
>> >> (We don't always hold the parent directories inode mutex when d_invalidate is called).
>> 
>> d_unhashed is not a permanent condition because of d_materialise_unique,
>> and d_splice_alias.
>> 
>> d_invalidate can be called on an unhashed dentry in one of two ways
>> (either d_revalidate dropped the dentry or another routine that drops
>> the dentry beat the current invocation of d_invalidate to the job).
>> 
>> 
>> There are 3 places d_revalidate is called.
>> 
>> Once on the rcu path with with the appropriate flag set.
>> 
>> Once without out the parent i_mutex held, just off of the rcu path,
>> on that path d_invalidate is when d_revalidate fails.
>> 
>> Once during lookup with the parent directory i_mutex held.
>> 
>> 
>> Because the parent direcories i_mutex is not always held accross
>> d_revalidate and the following d_invalidate it happens that d_invalidate
>> is not always an atomic operation.
>> 
>> 
>> At worst the race results in a dentry that is dropped when it could be
>> hashed,
>
> Because somebody not holding the i_mutex calls d_invalidate based on old
> information and unhashes something that
> d_materialise_unique/d_splice_alias just hashed?

More likely today somebody not holding i_mutex and not in rcu context
calls d_revalidate.  d_revalidate drops the dentry and before we
d_invalidate d_materialise_unique/d_splice_alias rehashes it.

After my changes it looks like it takes 3 processes two instances
of d_invalidate and a instance of d_materialise_unique/d_spliace_alias
to trigger this case.

In either case the window is very small and the outcome is effectively
harmless.  So I don't see this as a problem.

>> that we will resurrect next time someone attempts to look it
>> up and d_materialise_unique/d_splice_alias is called.
>
> OK.
>
>> None of that really matters for optimizing d_invalidate, but it is part
>> of the background in which d_invalidate lives. All that is significant
>> in d_invalidate is knowing that d_materialise_unique, and possibly
>> d_splice_alias may run concurrently with d_invalidate.  It is unlikely
>> and essentially harmless.
>> 
>> 
>> After my patchset (because I removed all of the d_drop's from
>> .d_revalidate) the only race that should remain is between two parallel
>> calls of d_invalidate.  Which probably means we can remove the test for
>> d_unhashed altogether.
>> 
>> Right now I just want to make this first big step and make certain the
>> code is solid.  After that optimization is easy.
>
> Thanks for the explanation!

Welcome.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux