Re: [PATCH v4 1/3] qspinlock: Introducing a 4-byte queue spinlock implementation

Waiman Long <waiman.long@xxxxxx> · Tue, 18 Feb 2014 19:50:13 -0500

On 02/18/2014 04:34 PM, Peter Zijlstra wrote:
On Tue, Feb 18, 2014 at 02:39:31PM -0500, Waiman Long wrote:
The #ifdef is harder to take away here. The point is that doing a 32-bit
exchange may accidentally steal the lock with the additional code to handle
that. Doing a 16-bit exchange, on the other hand, will never steal the lock
and so don't need the extra handling code. I could construct a function with
different return values to handle the different cases if you think it will
make the code easier to read.
Does it really pay to use xchg() with all those fixup cases? Why not
have a single cmpxchg() loop that does just the exact atomic op you
want?

The main reason for using xchg instead of cmpxchg is its performance 
impact when the lock is heavily contended. Under those circumstances, a 
task may need to do several tries of read+atomic-RMV before getting it 
right. This may cause a lot of cacheline contention. With xchg, we need 
at most 2 atomic ops. Using cmpxchg() does simplify the code a bit at 
the expense of performance with heavy contention.

-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html