Catalin Marinas <catalin.marinas@xxxxxxx> writes:
On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote:
diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c
index 9b6d90a72601..fc1204426158 100644
--- a/drivers/cpuidle/poll_state.c
+++ b/drivers/cpuidle/poll_state.c
@@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev,
raw_local_irq_enable();
if (!current_set_polling_and_test()) {
- unsigned int loop_count = 0;
u64 limit;
limit = cpuidle_poll_time(drv, dev);
while (!need_resched()) {
- cpu_relax();
- if (loop_count++ < POLL_IDLE_RELAX_COUNT)
- continue;
-
- loop_count = 0;
+ unsigned int loop_count = 0;
if (local_clock_noinstr() - time_start > limit) {
dev->poll_time_limit = true;
break;
}
+
+ smp_cond_load_relaxed(¤t_thread_info()->flags,
+ VAL & _TIF_NEED_RESCHED ||
+ loop_count++ >= POLL_IDLE_RELAX_COUNT);
The above is not guaranteed to make progress if _TIF_NEED_RESCHED is
never set. With the event stream enabled on arm64, the WFE will
eventually be woken up, loop_count incremented and the condition would
become true.
That makes sense.
However, the smp_cond_load_relaxed() semantics require that
a different agent updates the variable being waited on, not the waiting
CPU updating it itself.
Right. And, that seems to work well with the semantics of WFE. And,
the event stream (if enabled) has a side effect that allows the exit
from the loop.
Also note that the event stream can be disabled
on arm64 on the kernel command line.
Yes, that's a good point. In patch-11 I tried to address that aspect
by only allowing haltpoll to be force loaded.
But, I guess your point is that its not just haltpoll that has a problem,
but also regular polling -- and maybe the right thing to do would be to
disable polling if the event stream is disabled.
Does the code above break any other architecture?
Me (and others) have so far tested x86, ARM64 (with/without the
event stream), and I believe riscv. I haven't seen any obvious
breakage. But, that's probably because most of the time somebody would
be set TIF_NEED_RESCHED.
I'd say if you want
something like this, better introduce a new smp_cond_load_timeout()
API. The above looks like a hack that may only work on arm64 when the
event stream is enabled.
I had a preliminary version of smp_cond_load_relaxed_timeout() here:
https://lore.kernel.org/lkml/87edae3a1x.fsf@xxxxxxxxxx/
Even with an smp_cond_load_timeout(), we would need to fallback to
something like the above for uarchs without WFxT.
A generic option is udelay() (on arm64 it would use WFE/WFET by
default). Not sure how important it is for poll_idle() but the downside
of udelay() that it won't be able to also poll need_resched() while
waiting for the timeout. If this matters, you could instead make smaller
udelay() calls. Yet another problem, I don't know how energy efficient
udelay() is on x86 vs cpu_relax().
So maybe an smp_cond_load_timeout() would be better, implemented with
cpu_relax() generically and the arm64 would use LDXR, WFE and rely on
the event stream (or fall back to cpu_relax() if the event stream is
disabled).
Yeah, something like that might work.