On Thu, Oct 16, 2014 at 02:10:38PM -0400, Waiman Long wrote: > +static inline void pv_init_node(struct mcs_spinlock *node) > +{ > + struct pv_qnode *pn = (struct pv_qnode *)node; > + > + BUILD_BUG_ON(sizeof(struct pv_qnode) > 5*sizeof(struct mcs_spinlock)); > + > + if (!pv_enabled()) > + return; > + > + pn->cpustate = PV_CPU_ACTIVE; > + pn->mayhalt = false; > + pn->mycpu = smp_processor_id(); > + pn->head = PV_INVALID_HEAD; > +} > @@ -333,6 +393,7 @@ queue: > node += idx; > node->locked = 0; > node->next = NULL; > + pv_init_node(node); > > /* > * We touched a (possibly) cold cacheline in the per-cpu queue node; So even if !pv_enabled() the compiler will still have to emit the code for that inline, which will generate additional register pressure, icache pressure and lovely stuff like that. The patch I had used pv-ops for these things that would turn into NOPs in the regular case and callee-saved function calls for the PV case. That still does not entirely eliminate cost, but does reduce it significant. Please consider using that. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html