On 01/31/2014 04:26 AM, Peter Zijlstra wrote:
On Thu, Jan 30, 2014 at 04:17:15PM +0100, Peter Zijlstra wrote:
The below is still small and actually works.
OK, so having actually worked through the thing; I realized we can
actually do a version without MCS lock and instead use a ticket lock for
the waitqueue.
This is both smaller (back to 8 bytes for the rwlock_t), and should be
faster under moderate contention for not having to touch extra
cachelines.
Completely untested and with a rather crude generic ticket lock
implementation to illustrate the concept:
Using a ticket lock instead will have the same scalability problem as
the ticket spinlock as all the waiting threads will spin on the lock
cacheline causing a lot of cache bouncing traffic. That is the reason
why I want to replace ticket spinlock with queue spinlock. If the
16-byte size is an issue, I can use the same trick in the queue spinlock
patch to reduce its size down to 8 bytes with a bit more overhead in the
slowpath.
Another thing I want to discuss about is whether a bit more overhead in
moderate contention cases is really such a bit deal. With moderate
contention, I suppose the amount of time spent in the locking functions
will be just a few percent at most for real workloads. It won't really
be noticeable if the locking functions take, maybe, 50% more time to
finish. Anyway, I am going to do more performance testing on low end
machines.
-Longman
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html