Re: [PATCH v2 1/3] mutex: Make more scalable by doing less atomic operations

Rik van Riel <riel@xxxxxxxxxx> · Mon, 15 Apr 2013 10:45:32 -0400

On 04/15/2013 10:37 AM, Waiman Long wrote:
In the __mutex_lock_common() function, an initial entry into
the lock slow path will cause two atomic_xchg instructions to be
issued. Together with the atomic decrement in the fast path, a total
of three atomic read-modify-write instructions will be issued in
rapid succession. This can cause a lot of cache bouncing when many
tasks are trying to acquire the mutex at the same time.

This patch will reduce the number of atomic_xchg instructions used by
checking the counter value first before issuing the instruction. The
atomic_read() function is just a simple memory read. The atomic_xchg()
function, on the other hand, can be up to 2 order of magnitude or even
more in cost when compared with atomic_read(). By using atomic_read()
to check the value first before calling atomic_xchg(), we can avoid a
lot of unnecessary cache coherency traffic. The only downside with this
change is that a task on the slow path will have a tiny bit
less chance of getting the mutex when competing with another task
in the fast path.

Signed-off-by: Waiman Long <Waiman.Long@xxxxxx>
Reviewed-by: Davidlohr Bueso <davidlohr.bueso@xxxxxx>

Reviewed-by: Rik van Riel <riel@xxxxxxxxxx>

--
All rights reversed
--
To unsubscribe from this list: send the line "unsubscribe linux-arch" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html