Re: [PATCH v4] lib/dlock-list: Scale dlock_lists_empty()

Davidlohr Bueso <dave@xxxxxxxxxxxx> · Thu, 9 Nov 2017 09:24:08 -0800

On Wed, 08 Nov 2017, Boqun Feng wrote:
Or worse:

	 * CPU0				    CPU1
	 * dlock_list_add()                 dlock_lists_empty()
	 *					  smp_mb__before_atomic();
	 *				      [L] atomic_read(used_lists)
	 *   [S] atomic_inc(used_lists);
	 *       smp_mb__after_atomic();

, in which case dlock_lists_empty() misses a increment of used_lists.

That said, this may be fine for the usage of dlock_list. But I think the
comments are confusing or wrong. The reason is you can not use barriers
to order two memory ops on different CPUs, and usually we add comments
like:

So I think that case is OK as CPU1 legitimately reads a 0, the problem
would be if we miss the inc it because we are doing spin_lock().

	[S] ...			[S] ...
	<barrier>		<barrier>
	[L] ...			[L] ...

That is true.

Davidlohr, could you try to improve the comments by finding both memory
ops preceding and following the barriers? Maybe you will find those
barriers are not necessary in the end.

Ok so for the dlock_list_add() the first barrier is so that the atomic_inc()
is not done inside the critical region, after the head->lock is taken.
The other barriers that follow I put such that the respective atomic op
is done before the list_add(), however thinking about it I don't think
they are really needed.

Regarding dlock_list_empty(), the smp_mb__before_atomic() is mainly for
pairing reasons; but at this point I don't see a respective counterpart
for the above diagram so I'm unclear.

Thanks,
Davidlohr
Attachment:
signature.asc

Description: Digital signature