On Thu, Apr 17, 2014 at 11:03:58AM -0400, Waiman Long wrote: > There is a problem in the current trylock_pending() function. When the > lock is free, but the pending bit holder hasn't grabbed the lock & > cleared the pending bit yet, the trylock_pending() function will fail. I remember seeing some of this.. > It can be seen that the queue spinlock is slower than the ticket > spinlock when there are 2 or 3 contending tasks. In all the other case, > the queue spinlock is either equal or faster than the ticket spinlock. So with my code I get: qspinlock ticket local: 2: 8741.853010 2: 8812.042460 remote: 2: 8549.731795 2: 8709.005695 And that is without this optimization. Also note that I don't have this cmpxchg loop anymore. > kernel/locking/qspinlock.c | 32 ++++++++++++++++++++++++++++++-- > 1 files changed, 30 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/qspinlock.c b/kernel/locking/qspinlock.c > index 55601b4..497da24 100644 > --- a/kernel/locking/qspinlock.c > +++ b/kernel/locking/qspinlock.c > @@ -216,6 +216,7 @@ xchg_tail(struct qspinlock *lock, u32 tail, u32 *pval) > static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > { > u32 old, new, val = *pval; > + int retry = 1; > > /* > * trylock || pending > @@ -225,11 +226,38 @@ static inline int trylock_pending(struct qspinlock *lock, u32 *pval) > */ > for (;;) { > /* > - * If we observe any contention; queue. > + * If we observe that the queue is not empty, > + * return and be queued. > */ > - if (val & ~_Q_LOCKED_MASK) > + if (val & _Q_TAIL_MASK) > return 0; > > + if ((val & _Q_LOCKED_PENDING_MASK) == > + (_Q_LOCKED_VAL|_Q_PENDING_VAL)) { > + /* > + * If both the lock and pending bits are set, we wait > + * a while to see if that either bit will be cleared. > + * If that is no change, we return and be queued. > + */ > + if (!retry) > + return 0; > + retry--; > + cpu_relax(); > + cpu_relax(); > + *pval = val = atomic_read(&lock->val); > + continue; Since you gave up optimizing the _Q_PENDING_BITS != 8 case why bother with this? The switch from _Q_PENDING_VAL to _Q_LOCKED_VAL is atomic by virtue of your (endian challenged) clear_pending_set_locked(). > + } else if ((val & _Q_LOCKED_PENDING_MASK) == _Q_PENDING_VAL) { > + /* > + * Pending bit is set, but not the lock bit. > + * Assuming that the pending bit holder is going to > + * set the lock bit and clear the pending bit soon, > + * it is better to wait than to exit at this point. > + */ > + cpu_relax(); > + *pval = val = atomic_read(&lock->val); > + continue; > + } > + > new = _Q_LOCKED_VAL; > if (val == new) > new |= _Q_PENDING_VAL; Wouldn't something like: while (atomic_read(&lock->val) == _Q_PENDING_VAL) cpu_relax(); before the cmpxchg loop have gotten you all this? I just tried this on my code and I cannot see a difference. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html