On Wed, Mar 22, 2017 at 07:54:04AM -0700, Eric Dumazet wrote: > > I guess someone could code a lib/test_refcount.c launching X threads > using either atomic_inc or refcount_inc() in a loop. > > That would give a rough estimate of the refcount_t overhead among > various platforms. Cycles spend on uncontended ops: SKL SNB IVB-EP atomic: lock incl ~15 ~13 ~10 atomic-ref: call refcount_inc ~31 ~37 ~31 atomic-ref2: $inlined ~23 ~22 ~21 Contended numbers (E3-1245 v5): root@skl:~/spinlocks# LOCK=./atomic ./test1.sh 1: 14.797240 2: 87.451230 4: 100.747790 8: 118.234010 root@skl:~/spinlocks# LOCK=./atomic-ref ./test1.sh 1: 30.627320 2: 91.866730 4: 111.029560 8: 141.922420 root@skl:~/spinlocks# LOCK=./atomic-ref2 ./test1.sh 1: 23.243930 2: 98.620250 4: 119.604240 8: 124.864380 The code includes the patches found here: https://lkml.kernel.org/r/20170317211918.393791494@xxxxxxxxxxxxx and effectively does: #define REFCOUNT_WARN(cond, str) WARN_ON_ONCE(cond) s/WARN_ONCE/REFCOUNT_WARN/ on lib/refcount.c Find the tarball of the userspace code used attached (its a bit of a mess; its grown over time and needs a cleanup). I used: gcc (Debian 6.3.0-6) 6.3.0 20170205 So while its about ~20 cycles worse, reducing contention is far more effective than removing straight line instruction count (which too is entirely possible, because GCC generates absolute shite in places).
Attachment:
spinlocks.tar.bz2
Description: Binary data