On Mon, Mar 20, 2017 at 09:27:13PM +0800, Herbert Xu wrote: > On Mon, Mar 20, 2017 at 02:23:57PM +0100, Peter Zijlstra wrote: > > > > So what bench/setup do you want ran? > > You can start by counting how many cycles an atomic op takes > vs. how many cycles this new code takes. On what uarch? I think I tested hand coded asm version and it ended up about double the cycles for a cmpxchg loop vs the direct instruction on an IVB-EX (until the memory bus saturated, at which point they took the same). Newer parts will of course have different numbers, Can't we run some iperf on a 40gbe fiber loop or something? It would be very useful to have an actual workload we can run.