> * Reshetova, Elena <elena.reshetova@xxxxxxxxx> wrote: > > > > Subject: refcount: Out-of-line everything > > > From: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > > > Date: Fri Feb 10 16:27:52 CET 2017 > > > > > > Linus asked to please make this real C code. > > > > Perhaps a completely stupid question, but I am going to ask anyway since only > > this way I can learn. What a real difference it makes? Or are we talking more > > about styling and etc.? Is it because of size concerns? This way it is certainly > > now done differently than rest of atomic and kref... > > It's about generated code size mostly. > > This inline function is way too large to be inlined: > > static inline __refcount_check > bool refcount_add_not_zero(unsigned int i, refcount_t *r) > { > unsigned int old, new, val = atomic_read(&r->refs); > > for (;;) { > if (!val) > return false; > > if (unlikely(val == UINT_MAX)) > return true; > > new = val + i; > if (new < val) > new = UINT_MAX; > old = atomic_cmpxchg_relaxed(&r->refs, val, new); > if (old == val) > break; > > val = old; > } > > REFCOUNT_WARN(new == UINT_MAX, "refcount_t: saturated; > leaking memory.\n"); > > return true; > } > > When used then this function generates this much code on x86-64 defconfig: > > 00000000000084d0 <test>: > 84d0: 8b 0f mov (%rdi),%ecx > 84d2: 55 push %rbp > 84d3: 48 89 e5 mov %rsp,%rbp > > 84d6: 85 c9 test %ecx,%ecx | > 84d8: 74 30 je 850a <test+0x3a> | > 84da: 83 f9 ff cmp $0xffffffff,%ecx | > 84dd: be ff ff ff ff mov $0xffffffff,%esi | > 84e2: 75 04 jne 84e8 <test+0x18> | > 84e4: eb 1d jmp 8503 <test+0x33> | > 84e6: 89 c1 mov %eax,%ecx | > 84e8: 8d 51 01 lea 0x1(%rcx),%edx | > 84eb: 89 c8 mov %ecx,%eax | > 84ed: 39 ca cmp %ecx,%edx | > 84ef: 0f 42 d6 cmovb %esi,%edx | > 84f2: f0 0f b1 17 lock cmpxchg %edx,(%rdi) | > 84f6: 39 c8 cmp %ecx,%eax | > 84f8: 74 09 je 8503 <test+0x33> | > 84fa: 85 c0 test %eax,%eax | > 84fc: 74 0c je 850a <test+0x3a> | > 84fe: 83 f8 ff cmp $0xffffffff,%eax | > 8501: 75 e3 jne 84e6 <test+0x16> | > 8503: b8 01 00 00 00 mov $0x1,%eax | > > 8508: 5d pop %rbp > 8509: c3 retq > 850a: 31 c0 xor %eax,%eax > 850c: 5d pop %rbp > 850d: c3 retq > > > (I've annotated the fastpath impact with '|'. Out of line code generally does not > count.) > > It's about ~50 bytes of code per usage site. It might be better in some cases, but > not by much. > > This is way above any sane inlining threshold. The 'unconditionally good' inlining > threshold is at 1-2 instructions and below ~10 bytes of code. > > So for example refcount_set() and refcount_read() can stay inlined: > > static inline void refcount_set(refcount_t *r, unsigned int n) > { > atomic_set(&r->refs, n); > } > > static inline unsigned int refcount_read(const refcount_t *r) > { > return atomic_read(&r->refs); > } > > > ... beacuse they compile into a single instruction with 2-5 bytes I$ overhead. > > Thanks, > > Ingo Thank you very much Ingo for such detailed and nice explanation! Best Regards, Elena -- To unsubscribe from this list: send the line "unsubscribe linux-tip-commits" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
![]() |