Re: [PATCH v6 0/2] x86: Implement fast refcount overflow protection

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 24 Jul 2017 14:23:27 +0200

On Mon, Jul 24, 2017 at 10:09:32PM +1000, Michael Ellerman wrote:
> Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> > anyway, and the fact that your LL/SC is horrendously slow in any case.
> 
> Boo :/

:-)

> Just kidding. I suspect you're right that we can probably pack a
> reasonable amount of tests in the body of the LL/SC and not notice.
> 
> > Also, I still haven't seen an actual benchmark where our cmpxchg loop
> > actually regresses anything, just a lot of yelling about potential
> > regressions :/
> 
> Heh yeah. Though I have looked at the code it generates on PPC and it's
> not sleek, though I guess that's not a benchmark is it :)

Oh for sure, GCC still can't sanely convert a cmpxchg loop (esp. if the
cmpxchg is implemented using asm) into a native LL/SC sequence, so the
generic code will end up looking pretty horrendous.

A native implementation of the same semantics should look loads better.

One thing that might help you is that refcount_dec_and_test() is weaker
than atomic_dec_and_test() wrt ordering, so that might help some
(RELEASE vs fully ordered).