On 11/22/2013 02:14 PM, Linus Torvalds wrote:
On Fri, Nov 22, 2013 at 11:04 AM, Waiman Long<Waiman.Long@xxxxxx> wrote:
In term of single-thread performance (no contention), a 256K
lock/unlock loop was run on a 2.4GHz and 2.93Ghz Westmere x86-64
CPUs. The following table shows the average time (in ns) for a single
lock/unlock sequence (including the looping and timing overhead):
Lock Type 2.4GHz 2.93GHz
--------- ------ -------
Ticket spinlock 14.9 12.3
Read lock 17.0 13.5
Write lock 17.0 13.5
Queue read lock 16.0 13.4
Queue write lock 9.2 7.8
Can you verify for me that you re-did those numbers? Because it used
to be that the fair queue write lock was slower than the numbers you
now quote..
Was the cost of the fair queue write lock purely in the extra
conditional testing for whether the lock was supposed to be fair or
not, and now that you dropped that, it's fast? If so, then that's an
extra argument for the old conditional fair/unfair being complete
garbage.
Yes, the extra latency of the fair lock in earlier patch is due to the
need to do a second cmpxchg(). That can be avoided by doing a read
first, but that is not good for good cache. So I optimized it for the
default unfair lock. By supporting only one version, there is no need to
do a second cmpxchg anymore.
Alternatively, maybe you just took the old timings, and the above
numbers are for the old unfair code, and *not* for the actual patch
you sent out?
So please double-check and verify.
Linus
I reran the timing test on the 2.93GHz processor. The timing is the
practically the same. I reused the old one for the 2.4GHz processor.
Regards,
Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html