On Wed, Sep 25, 2024 at 04:24:15PM -0700, Ankur Arora wrote: > diff --git a/drivers/cpuidle/poll_state.c b/drivers/cpuidle/poll_state.c > index 9b6d90a72601..fc1204426158 100644 > --- a/drivers/cpuidle/poll_state.c > +++ b/drivers/cpuidle/poll_state.c > @@ -21,21 +21,20 @@ static int __cpuidle poll_idle(struct cpuidle_device *dev, > > raw_local_irq_enable(); > if (!current_set_polling_and_test()) { > - unsigned int loop_count = 0; > u64 limit; > > limit = cpuidle_poll_time(drv, dev); > > while (!need_resched()) { > - cpu_relax(); > - if (loop_count++ < POLL_IDLE_RELAX_COUNT) > - continue; > - > - loop_count = 0; > + unsigned int loop_count = 0; > if (local_clock_noinstr() - time_start > limit) { > dev->poll_time_limit = true; > break; > } > + > + smp_cond_load_relaxed(¤t_thread_info()->flags, > + VAL & _TIF_NEED_RESCHED || > + loop_count++ >= POLL_IDLE_RELAX_COUNT); The above is not guaranteed to make progress if _TIF_NEED_RESCHED is never set. With the event stream enabled on arm64, the WFE will eventually be woken up, loop_count incremented and the condition would become true. However, the smp_cond_load_relaxed() semantics require that a different agent updates the variable being waited on, not the waiting CPU updating it itself. Also note that the event stream can be disabled on arm64 on the kernel command line. Does the code above break any other architecture? I'd say if you want something like this, better introduce a new smp_cond_load_timeout() API. The above looks like a hack that may only work on arm64 when the event stream is enabled. A generic option is udelay() (on arm64 it would use WFE/WFET by default). Not sure how important it is for poll_idle() but the downside of udelay() that it won't be able to also poll need_resched() while waiting for the timeout. If this matters, you could instead make smaller udelay() calls. Yet another problem, I don't know how energy efficient udelay() is on x86 vs cpu_relax(). So maybe an smp_cond_load_timeout() would be better, implemented with cpu_relax() generically and the arm64 would use LDXR, WFE and rely on the event stream (or fall back to cpu_relax() if the event stream is disabled). -- Catalin