Re: races in ll_splice_alias() and elsewhere (ext4, ocfs2)

"Drokin, Oleg" <oleg.drokin@xxxxxxxxx> · Thu, 10 Mar 2016 05:47:48 +0000

On Mar 9, 2016, at 11:43 PM, Al Viro wrote:

> On Thu, Mar 10, 2016 at 03:46:43AM +0000, Drokin, Oleg wrote:
> 
>>> Wait a minute.  If it's hashed, has the right name and the right parent,
>>> why the hell are we calling ->lookup() on a new dentry in the first place?
>>> Why hadn't we simply picked it from dcache?
>> 
>> This is because of the trickery we do in the d_compare.
>> our d_compare looks at the "invalid" flag and if it's set, returns "not matching",
>> triggering the lookup instead of revalidate.
>> This makes revalidate simple and fast.
>> (We used to have a complicated revalidate with a lot of code duplication with
>> lookup in order to be able to query the server and pass all sorts of data there
>> and it was nothing but trouble).
> 
> *Ugh*...  That's really nasty.  We certainly could make d_exact_match()
> accept unhashed ones and make rehashing conditional (NFS doesn't pull
> anything similar, so it won't care), but your ->d_revalidate()
> has exact same problem as ext4_d_revalidate() one mentioned upthread -
> there's no warranty that dentry->d_parent will stay stable.
> 
> We are *NOT* guaranteed locked parent when ->d_revalidate() is called, or
> we would have to lock every damn directory on the way through the pathname
> resolution.  Moreover, ->d_revalidate() really can overlap with rename(2).

Hm, you are right.

I wonder why did we never see this race in practice, though.

I understand the race is rare, but we have a special test scenario called "racer"
(you can see it here: http://git.whamcloud.com/fs/lustre-release.git/tree/refs/heads/master:/lustre/tests/racer )

that does a bunch of operations in a limited namespace uncovering all sorts of
race conditions.

In order to make the effect better, I typically run it on a node with "unstable
cpus" (really a vm with 8 virtual cpus) where the cpus randomly get paused
for random amount of time up to a few seconds to extend the race windows.

All sorts of super unlikely stuff was hit in our code, but not this one.
(don't remember exact details and don't want to make stuff up on the spot ;)
but it certainly made some eye-opening races )

Is there something that suppresses it like our additional locking?
Hm, I guess that must be it. Rename always reaches to the server and that
invalidates the lustre dentry lock.
So if the rename progressed far enough that the lock got cancelled and dentry
got invalidated - and then lookup happened - it would never call revalidate
in the first place, so d_move does not matter.
Now if lookup and rename started at about the same time and we entered revalidate
before the lock got cancelled, the lock cancellation comes in and
blocks in ll_invalidate_aliases()->ll_lock_dcache() on inode_mutex, and
until it completes, the server cannot continue.

So probably just by this sheer luck we are not really affected even when VFS rules
say we are.

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html