Christoph Lameter (Ampere) <cl@xxxxxxxxxx> writes: > On Tue, 15 Oct 2024, Ankur Arora wrote: > >> > Alternatively, if we get an IPI anyway, we can avoid smp_cond_load() and >> > rely on need_resched() and some new delay/cpu_relax() API that waits for >> > a timeout or an IPI, whichever comes first. E.g. cpu_relax_timeout() >> > which on arm64 it's just a simplified version of __delay() without the >> > 'while' loops. >> >> AFAICT when polling (which we are since poll_idle() calls >> current_set_polling_and_test()), the scheduler will elide the IPI >> by remotely setting the need-resched bit via set_nr_if_polling(). > > The scheduler runs on multiple cores. The core on which we are > running this code puts the core into a wait state so the scheduler does > not run on this core at all during the wait period. Yes. > The other cores may run scheduler functions and set the need_resched bit > for the core where we are currently waiting. Yes. > The other core will wake our core up by sending an IPI. The IPI will > invoke a scheduler function on our core and the WFE will continue. Why? The target core is not sleeping. It is *polling* on a memory address (on arm64, via LDXR; WFE). Ergo an IPI is not needed to tell it that a need-resched bit is set. >> Once we stop polling then the scheduler should take the IPI path >> because call_function_single_prep_ipi() will fail. > > The IPI stops the polling. IPI is an interrupt. Yes an IPI is an interrupt. And, since the target is polling there's no need for an interrupt to inform it that a memory address on which it is polling has changed. resched_curr() is a good example. It only sends the resched IPI if the target is not polling. resched_curr() { ... if (set_nr_and_not_polling(curr)) smp_send_reschedule(cpu); else trace_sched_wake_idle_without_ipi(cpu); } -- ankur