On Tue, Jan 14, 2020 at 10:59:18PM -0500, Alex Kogan wrote: > +/* this function is called only when the primary queue is empty */ > +static inline bool cna_try_change_tail(struct qspinlock *lock, u32 val, > + struct mcs_spinlock *node) > +{ > + struct mcs_spinlock *head_2nd, *tail_2nd; > + u32 new; > + > + /* If the secondary queue is empty, do what MCS does. */ > + if (node->locked <= 1) > + return __try_clear_tail(lock, val, node); > + > + /* > + * Try to update the tail value to the last node in the secondary queue. > + * If successful, pass the lock to the first thread in the secondary > + * queue. Doing those two actions effectively moves all nodes from the > + * secondary queue into the main one. > + */ > + tail_2nd = decode_tail(node->locked); > + head_2nd = tail_2nd->next; > + new = ((struct cna_node *)tail_2nd)->encoded_tail + _Q_LOCKED_VAL; > + > + if (atomic_try_cmpxchg_relaxed(&lock->val, &val, new)) { > + /* > + * Try to reset @next in tail_2nd to NULL, but no need to check > + * the result - if failed, a new successor has updated it. > + */ I think you actually have an ordering bug here; the load of head_2nd *must* happen before the atomic_try_cmpxchg(), otherwise it might observe the new next and clear a valid next pointer. What would be the best fix for that; I'm thinking: head_2nd = smp_load_acquire(&tail_2nd->next); Will? > + cmpxchg_relaxed(&tail_2nd->next, head_2nd, NULL); > + arch_mcs_pass_lock(&head_2nd->locked, 1); > + return true; > + } > + > + return false; > +}