Re: lockless case of retain_dentry() (was Re: [PATCH 09/15] fold the call of retain_dentry() into fast_dput())

Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> · Sun, 26 Nov 2023 09:59:12 -0800

On Sun, 26 Nov 2023 at 09:06, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> In this case, the 'retry' count is actually a noticeable part of the
> code generation, and is probably also why it has to save/restore
> '%rbx'.

Nope. The reason for having to save/restore a register is the

        spin_lock(&lockref->lock);
        lockref->count++;

sequence: since spin_lock() is a function call, it will clobber all
the registers that a function can clobber, and the callee has to keep
the 'lockref' argument somewhere. So it needs a callee-saved register,
which it then itself needs to save.

Inlining the spinlock sequence entirely would fix it, but is the wrong
thing to do for the slow case.

Marking the spinlock functions with

  __attribute__((no_caller_saved_registers))

might actually be a reasonable option. It makes the spinlock itself
more expensive (since now it saves/restores all the registers it
uses), but in this case that's the right thing to do.

Of course, in this case, lockref has already done the optimistic
"check the lock" version, so our spinlock code that does that

        LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock);

which first tries to do the trylock, is all kinds of wrong.

In a perfect world, the lockref code actually wants only the
slow-path, since it has already done the fast-path case. And it would
have that "slow path saves all registers" thing. That might be a good
idea for spinlocks in general, who knows..

Oh well. Probably not worth worrying about. In my profiles, lockref
looks pretty good even under heavy dentry load. Even if it's not
perfect.

                 Linus