Re: [PATCH 2/4] fs/dcache: Split __d_lookup_done()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jun 13, 2022 at 04:07:10PM +0200, Sebastian Andrzej Siewior wrote:
> __d_lookup_done() wakes waiters on dentry::d_wait inside a preemption
> disabled region. This violates the PREEMPT_RT constraints as the wake up
> acquires wait_queue_head::lock which is a "sleeping" spinlock on RT.

I'd probably turn that into something like

__d_lookup_done() wakes waiters on dentry->d_wait.  On PREEMPT_RT we are
not allowed to do that with preemption disabled, since the wakeup
acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT.

Calling it under dentry->d_lock is not a problem, since that is also
a "sleeping" spinlock on the same configs.  Unfortunately, two of
its callers (__d_add() and __d_move()) are holding more than just ->d_lock
and that needs to be dealt with.

The key observation is that wakeup can be moved to any point before
dropping ->d_lock.

> As a first step to solve this, move the wake up outside of the
> hlist_bl_lock() held section.
> 
> This is safe because:
> 
>   1) The whole sequence including the wake up is protected by dentry::lock.
> 
>   2) The waitqueue head is allocated by the caller on stack and can't go
>      away until the whole callchain completes.

	That's too vague and in one case simply incorrect - the call
of d_alloc_parallel() in nfs_call_unlink() does *not* have wq in stack
frame of anything in the callchain.  Incidentally, another unusual caller
(d_add_ci()) has a bug (see below).  What really matters is that we can't
reach destruction of wq without __d_lookup_done() under ->d_lock.

Waiters get inserted into ->d_wait only after they'd taken ->d_lock
and observed DCACHE_PAR_LOOKUP in flags.  As long as they are
woken up (and evicted from the queue) between the moment __d_lookup_done()
has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe,
since the waitqueue ->d_wait points to won't get destroyed without
having __d_lookup_done(dentry) called (under ->d_lock).

->d_wait is set only by d_alloc_parallel() and only in case when
it returns a freshly allocated in-lookup dentry.  Whenever that happens,
we are guaranteed that __d_lookup_done() will be called for resulting
dentry (under ->d_lock) before the wq in question gets destroyed.

With two exceptions wq lives in call frame of the caller of
d_alloc_parallel() and we have an explicit d_lookup_done() on the
resulting in-lookup dentry before we leave that frame.

One of those exceptions is nfs_call_unlink(), where wq is embedded into
(dynamically allocated) struct nfs_unlinkdata.  It is destroyed in
nfs_async_unlink_release() after an explicit d_lookup_done() on the
dentry wq went into.

Remaining exception is d_add_ci().  There wq is what we'd found in
->d_wait of d_add_ci() argument.  Callers of d_add_ci() are two
instances of ->d_lookup() and they must have been given an in-lookup
dentry.  Which means that they'd been called by __lookup_slow() or
lookup_open(), with wq in the call frame of one of those.

[[[
Result of d_alloc_parallel() in d_add_ci() is fed to
d_splice_alias(), which *NORMALLY* feeds it to __d_add() or
__d_move() in a way that will have __d_lookup_done() applied to it.

	However, there is a nasty possibility - d_splice_alias() might
legitimately fail without having marked the sucker not in-lookup.  dentry
will get dropped by d_add_ci(), so ->d_wait won't end up pointing to freed
object, but it's still a bug - retain_dentry() will scream bloody murder
upon seeing that, and for a good reason; we'll get hash chain corrupted.
It's impossible to hit without corrupted fs image (ntfs or case-insensitive
xfs), but it's a bug.  Fix is a one-liner (add d_lookup_done(found);
right after
        res = d_splice_alias(inode, found);
	if (res) {
in d_add_ci()) and with that done the last sentence about d_add_ci() turns
into
]]]

Result of d_alloc_parallel() in d_add_ci() is fed to
d_splice_alias(), which either returns non-NULL (and d_add_ci() does
d_lookup_done()) or feeds dentry to __d_add() that will do
__d_lookup_done() under ->d_lock.  That concludes the analysis.


PS: I'm not sure we need to do this migration of wakeup in stages;
lift it into the caller of __d_lookup_done() as the first step,
then move the damn thing all the way to end_dir_add().  Analysis
can go into either...



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux