Re: [PATCH 2/2] pidfd: add pidfdfs

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 18 Feb 2024 10:57:19 -0800

On Sun, 18 Feb 2024 at 10:08, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> The only ugliness I see is the one that comes from the original code -
> I'm not thrilled about the "return -EAGAIN" part, and I think that if
> we found a previously stashed entry after all, we should loop.
>
> But I think that whole horror comes from a fear of an endless loop
> when the dentry is marked dead, and another CPU still sees the old
> value (so you can't re-use it, and yet it's not NULL).

Actually, I think this is fairly easily fixable, but let's fix it
*after* you've done your cleanups.

The eventual fix is fairly simple: allow installing a new dentry not
just as a replacement for a previous NULL dentry, but also replacing a
previous dead dentry.

That requires just two simple changes:

 - the ->d_prune() callback should no longer just blindly set the
stashed value to NULL, it would do

        // Somebody could have re-used our stash as the dentry
        // died, so we only NULL it out of if matches our pruned one
        cmpxchg(&stashed, dentry, NULL);

 - when installing, instead of doing

        if (cmpxchg(&stashed, NULL, dentry) .. FAIL ..

   we'd loop with something like this:

        guard(rcu)();
        for (;;) {
                struct dentry *old;

                // Assume any old dentry was cleared out
                old = cmpxchg(&stashed, NULL, dentry);
                if (likely(!old))
                        break;

                // Maybe somebody else installed a good dentry
                // .. so release ours and use the new one
                if (lockref_get_not_dead(&old->d_lockref)) {
                        d_delete(dentry);
                        dput(dentry);
                        return old;
                }

                // There's an old dead dentry there, try to take it over
                if (likely(try_cmpxchg(&stashed, old, dentry)))
                        break;

                // Ooops, that failed, to back and try again
                cpu_relax();
        }

        // We successfully installed our dentry
        // as the new stashed one
        return dentry;

which really isn't doesn't look that complicated (note the RCU guard
as a way to make sure this all runs RCU-locked without needing to
worry about the unlock sequence).

Note: your initial optimistic "get_stashed_dentry()" stays exactly as
it is. The above loop is just for the "oh, we didn't trivially re-use
an old dentry, so now we need to allocate a new one and install it"
case.

Anyway, the above is written just in the MUA, there's no testing of
the above, and again - I think this should be done *after* you've done
the cleanups of the current code. But I think it would clarify the odd
race condition with an old dentry dying just as a new one is created,
and make sure there isn't some -EAGAIN case that people need to worry
about.

Because either we can re-use the old one, or there isn't an old one,
or we find a dead one that can't be reused but can just be replaced.

Fairly straightforward, no?

               Linus