Peter,
I was trying to implement the generic queue code exchange code using
cmpxchg as suggested by you. However, when I gathered the performance
data, the code performed worse than I expected at a higher contention
level. Below were the execution time of the benchmark tool that I sent
you:
[xchg] [cmpxchg]
# of tasks Ticket lock Queue lock Queue Lock
---------- ----------- ----------- ----------
1 135 135 135
2 732 1315 1102
3 1827 2372 2681
4 2689 2934 5392
5 3736 3658 7696
6 4942 4434 9876
7 6304 5176 11901
8 7736 5955 14551
Below is the code that I used:
static inline u32 queue_code_xchg(struct qspinlock *lock, u32 *ocode,
u32 ncode)
{
while (true) {
u32 qlcode = atomic_read(&lock->qlcode);
if (qlcode == 0) {
/*
* Try to get the lock
*/
if (atomic_cmpxchg(&lock->qlcode, 0,
_QSPINLOCK_LOCKED) == 0)
return 1;
} else if (qlcode & _QSPINLOCK_LOCKED) {
*ocode = atomic_cmpxchg(&lock->qlcode, qlcode,
ncode | _QSPINLOCK_LOCKED);
if (*ocode == qlcode) {
/* Clear lock bit before return */
*ocode &= ~_QSPINLOCK_LOCKED;
return 0;
}
}
/*
* Wait if atomic_cmpxchg() fails or lock is
temporarily free.
*/
arch_mutex_cpu_relax();
}
}
My cmpxchg code is not optimal, and I can probably tune the code to
make it perform better. Given the trend that I was seeing, however,
I think I will keep the current xchg code, but I will package it in
an inline function.
-Longman
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization