Re: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/11/18 11:42 PM, Akira Yokosawa wrote:
> From 7e7c3a20d08831cd64b77a4e8d8f693b4725ef89 Mon Sep 17 00:00:00 2001
> From: Akira Yokosawa <akiyks@xxxxxxxxx>
> Date: Tue, 11 Dec 2018 21:37:11 +0900
> Subject: [PATCH 3/4] CodeSamples: Fix definition of cmpxchg() in api-gcc.h
> 
> Do the same change as CodeSamples/formal/litmus/api.h.
> 
> Signed-off-by: Akira Yokosawa <akiyks@xxxxxxxxx>
> ---
>  CodeSamples/api-pthreads/api-gcc.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/CodeSamples/api-pthreads/api-gcc.h b/CodeSamples/api-pthreads/api-gcc.h
> index 3afe340..b66f4b9 100644
> --- a/CodeSamples/api-pthreads/api-gcc.h
> +++ b/CodeSamples/api-pthreads/api-gcc.h
> @@ -168,8 +168,9 @@ struct __xchg_dummy {
>  ({ \
>  	typeof(*ptr) _____actual = (o); \
>  	\
> -	__atomic_compare_exchange_n(ptr, (void *)&_____actual, (n), 1, \
> -			__ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST) ? (o) : (o)+1; \
> +	__atomic_compare_exchange_n((ptr), (void *)&_____actual, (n), 0, \
> +			__ATOMIC_SEQ_CST, __ATOMIC_RELAXED); \
> +	_____actual; \
>  })
>  

Hi Akira,

Another reason that the performance of cmpxchg is catching up with cmpxchg_weak is that __ATOMIC_SEQ_CST is replaced by __ATOMIC_RELAXED in this patch. The use of __ATOMIC_RELAXED means if the CAS primitive fails, the relaxed semantic is used, rather than sequential consistent. Following are some experiment results:

# If __ATOMIC_RELAXED is used for both cmpxchg and cmpxchg_weak

./count_lim_atomic 64 uperf
ns/update: 290

./count_lim_atomic_weak 64 uperf
ns/update: 301


# and then if __ATOMIC_SEQ_CST is used for both cmpxchg and cmpxchg_weak

./count_lim_atomic 64 uperf
ns/update: 316

./count_lim_atomic_weak 64 uperf
ns/update: 302

./count_lim_atomic 120 uperf
ns/update: 630

./count_lim_atomic_weak 120 uperf
ns/update: 568

The results show that if we want to ensure sequential consistency when the CAS primitive fails, cmpxchg_weak performs better than cmpxchg. It seems that the (type of variation, failure_memorder) pair affects performance. I know that PPC uses LL/SC to simulate CAS. But what's the relationship between a simulated CAS and the memory order. This is interesting because as far as I know, PPC and ARM are using LL/SC to simulate atomic primitives such as CAS and FAA. So FAA might have the same behavior.

In actually, I'm not very clear about the meaning of different types of failure memory orders. For example, when should we use __ATOMIC_RELAXED, rather than __ATOMIC_SEQ_CST, if a CAS fails? What happen if __ATOMIC_RELAXED is used for x86? The one I'm look at is https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html . Do you know some resources about this? I can look into this tomorrow. Thanks.


--Junchang



>  static __inline__ int atomic_cmpxchg(atomic_t *v, int old, int new)
> 



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux