Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes: > On Mon, Jul 24, 2017 at 04:38:06PM +1000, Michael Ellerman wrote: > >> What I'm not entirely clear on is what the best trade off is in terms of >> overhead vs checks. The summary of behaviour between the fast and full >> versions you promised Ingo will help there I think. > > That's something that's probably completely different for PPC than it is > for x86. Yeah definitely. I guess I see the x86 version as a lower bound on the semantics we'd need to implement and still claim to implement the refcount stuff. > Both because your primitive is LL/SC and thus the saturation > semantics we need a cmpxchg loop for are more natural in your case Yay! > anyway, and the fact that your LL/SC is horrendously slow in any case. Boo :/ Just kidding. I suspect you're right that we can probably pack a reasonable amount of tests in the body of the LL/SC and not notice. > Also, I still haven't seen an actual benchmark where our cmpxchg loop > actually regresses anything, just a lot of yelling about potential > regressions :/ Heh yeah. Though I have looked at the code it generates on PPC and it's not sleek, though I guess that's not a benchmark is it :) cheers