On Mon, Apr 3, 2017 at 4:50 PM, Nicholas Piggin <npiggin@xxxxxxxxx> wrote: > > POWER does not have an instruction like pause. We can only set current > thread priority, and current implementations do something like allocate > issue cycles to threads based on relative priorities. So there should > be at least one or two issue cycles at low priority, but ideally we > would not be changing priority in the busy-wait loop because it can > impact other threads in the core. > > I couldn't think of a good way to improve cpu_relax. Our (open source) > firmware has a cpu_relax, and it puts a bunch of nops between low and > normal priority instructions so we get some fetch cycles at low prio. > That isn't ideal though. > > If you have any ideas, I'd be open to them. So the idea would be that maybe we can just make those things explicit. IOW, instead of having that magical looping construct that does other magical hidden things as part of the loop, maybe we can just have a begin_cpu_relax(); while (!cond) cpu_relax(); end_cpu_relax(); and then architectures can decide how they implement it. So for x86, the begin/end macros would be empty. For ppc, maybe begin/end would be the "lower and raise priority", while cpu_relax() itself is an empty thing. Or maybe "begin" just clears a counter, while "cpu_relax()" does some "increase iterations, and lower priority after X iterations", and then "end" raises the priority again. The "do magic having a special loop" approach disturbs me. I'd much rather have more explicit hooks that allow people to do their own loop semantics (including having a "return" to exit early). But that depends on architectures having some pattern that we *can* abstract. Would some "begin/in-loop/end" pattern like the above be sufficient? The pure "in-loop" case we have now (ie "cpu_relax()" clearly isn't sufficient. I think s390 might have issues too, since they tried to have that "cpu_relax_yield" thing (which is only used by stop_machine), and they've tried cpu_relax_lowlatency() and other games. Linus