Catalin Marinas <catalin.marinas@xxxxxxx> writes: > On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote: >> diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c >> index 9b6d90a72601..fc1204426158 100644 >> --- a/drivers/cpuidle/poll_state.c >> +++ b/drivers/cpuidle/poll_state.c >> @@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, >> >> raw_local_irq_enable(); >> if (!current_set_polling_and_test()) { >> - unsigned int loop_count = 0; >> u64 limit; >> >> limit = cpuidle_poll_time(drv, dev); >> >> while (!need_resched()) { >> - cpu_relax(); >> - if (loop_count++ < POLL_IDLE_RELAX_COUNT) >> - continue; >> - >> - loop_count = 0; >> + unsigned int loop_count = 0; >> if (local_clock_noinstr() - time_start > limit) { >> dev->poll_time_limit = true; >> break; >> } >> + >> + smp_cond_load_relaxed(¤t_thread_info()->flags, >> + VAL & _TIF_NEED_RESCHED || >> + loop_count++ >= POLL_IDLE_RELAX_COUNT); > > The above is not guaranteed to make progress if _TIF_NEED_RESCHED is > never set. With the event stream enabled on arm64, the WFE will > eventually be woken up, loop_count incremented and the condition would > become true. That makes sense. > However, the smp_cond_load_relaxed() semantics require that > a different agent updates the variable being waited on, not the waiting > CPU updating it itself. Right. And, that seems to work well with the semantics of WFE. And, the event stream (if enabled) has a side effect that allows the exit from the loop. > Also note that the event stream can be disabled > on arm64 on the kernel command line. Yes, that's a good point. In patch-11 I tried to address that aspect by only allowing haltpoll to be force loaded. But, I guess your point is that its not just haltpoll that has a problem, but also regular polling -- and maybe the right thing to do would be to disable polling if the event stream is disabled. > Does the code above break any other architecture? Me (and others) have so far tested x86, ARM64 (with/without the event stream), and I believe riscv. I haven't seen any obvious breakage. But, that's probably because most of the time somebody would be set TIF_NEED_RESCHED. > I'd say if you want > something like this, better introduce a new smp_cond_load_timeout() > API. The above looks like a hack that may only work on arm64 when the > event stream is enabled. I had a preliminary version of smp_cond_load_relaxed_timeout() here: https://lore.kernel.org/lkml/87edae3a1x.fsf@xxxxxxxxxx/ Even with an smp_cond_load_timeout(), we would need to fallback to something like the above for uarchs without WFxT. > A generic option is udelay() (on arm64 it would use WFE/WFET by > default). Not sure how important it is for poll_idle() but the downside > of udelay() that it won't be able to also poll need_resched() while > waiting for the timeout. If this matters, you could instead make smaller > udelay() calls. Yet another problem, I don't know how energy efficient > udelay() is on x86 vs cpu_relax(). > > So maybe an smp_cond_load_timeout() would be better, implemented with > cpu_relax() generically and the arm64 would use LDXR, WFE and rely on > the event stream (or fall back to cpu_relax() if the event stream is > disabled). Yeah, something like that might work. -- ankur