On Sat, Jul 11, 2020 at 02:21:27PM -0400, Waiman Long wrote: > +static DEFINE_PER_CPU_READ_MOSTLY(u8, pcpu_lockval) = _Q_LOCKED_VAL; > > /* > * We must be able to distinguish between no-tail and the tail at 0:0, > @@ -138,6 +139,19 @@ struct mcs_spinlock *grab_mcs_node(struct mcs_spinlock *base, int idx) > > #define _Q_LOCKED_PENDING_MASK (_Q_LOCKED_MASK | _Q_PENDING_MASK) > > +static __init int __init_pcpu_lockval(void) > +{ > + int cpu; > + > + for_each_possible_cpu(cpu) { > + u8 lockval = (cpu + 2 < _Q_LOCKED_MASK - 1) ? cpu + 2 > + : _Q_LOCKED_VAL; > + per_cpu(pcpu_lockval, cpu) = lockval; > + } > + return 0; > +} > +early_initcall(__init_pcpu_lockval); > + u8 lockval = this_cpu_read(pcpu_lockval); Urgh... so you'd rather read a guaranteed cold line than to use smp_processor_id(), which we already use anyway? I'm skeptical this helps anything, and it certainly makes the code more horrible :-(