Re: [PATCH next 2/5] locking/osq_lock: Avoid dirtying the local cpu's 'node' in the osq_lock() fast path.

Boqun Feng <boqun.feng@xxxxxxxxx> · Tue, 2 Jan 2024 10:53:43 -0800

On Sat, Dec 30, 2023 at 03:49:52PM +0000, David Laight wrote:
[...]
> I don't completely understand the 'acquire'/'release' semantics (they didn't
> exist when I started doing SMP kernel code in the late 1980s).
> But it looks odd that osq_unlock()'s fast path uses _release but the very
> similar code in osq_wait_next() uses _acquire.
> 

The _release in osq_unlock() is needed since unlocks are needed to be
RELEASE so that lock+unlock can be a critical section (i.e. no memory
accesses can escape). When osq_wait_next() is used in non unlock cases,
the RELEASE is not required. As for the case where osq_wait_next() is
used in osq_unlock(), there is a xchg() preceding it, which provides a
full barrier, so things are fine.

/me wonders whether we can relax the _acquire in osq_wait_next() into
a _relaxed.

> Indeed, apart from some (assumed) optimisations, I think osq_unlock()
> could just be:
> 	next = osq_wait_next(lock, this_cpu_ptr(&osq_node), 0);
> 	if (next)
> 		next->locked = 1;
> 

If so we need to provide some sort of RELEASE semantics for the
osq_unlock() in all the cases.

Regards,
Boqun

> I don't think the order of the tests for lock->tail and node->next
> matter in osq_wait_next().
> If they were swapped the 'Second most likely case' code from osq_unlock()
> could be removed.
> (The 'uncontended case' doesn't need to load the address of 'node'.)
> 
> 	David
> 		
> 
> -
> Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> Registration No: 1397386 (Wales)