On Sun, 26 Nov 2023 at 09:06, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > In this case, the 'retry' count is actually a noticeable part of the > code generation, and is probably also why it has to save/restore > '%rbx'. Nope. The reason for having to save/restore a register is the spin_lock(&lockref->lock); lockref->count++; sequence: since spin_lock() is a function call, it will clobber all the registers that a function can clobber, and the callee has to keep the 'lockref' argument somewhere. So it needs a callee-saved register, which it then itself needs to save. Inlining the spinlock sequence entirely would fix it, but is the wrong thing to do for the slow case. Marking the spinlock functions with __attribute__((no_caller_saved_registers)) might actually be a reasonable option. It makes the spinlock itself more expensive (since now it saves/restores all the registers it uses), but in this case that's the right thing to do. Of course, in this case, lockref has already done the optimistic "check the lock" version, so our spinlock code that does that LOCK_CONTENDED(lock, do_raw_spin_trylock, do_raw_spin_lock); which first tries to do the trylock, is all kinds of wrong. In a perfect world, the lockref code actually wants only the slow-path, since it has already done the fast-path case. And it would have that "slow path saves all registers" thing. That might be a good idea for spinlocks in general, who knows.. Oh well. Probably not worth worrying about. In my profiles, lockref looks pretty good even under heavy dentry load. Even if it's not perfect. Linus