Hi, This patchset adds support for polling in idle on arm64 via poll_idle() and adds the requisite support to acpi-idle and cpuidle-haltpoll. v10 is a respin of v9 with the timed wait barrier logic (smp_cond_load_relaxed_timewait()) moved out into a separate series [0]. (The barrier patches could also do with some eyes.) Why poll in idle? == The benefit of polling in idle is to reduce the cost (and latency) of remote wakeups. When enabled, these can be done just by setting the need-resched bit, eliding the IPI, and the cost of handling the interrupt on the receiver. Comparing sched-pipe performance on a guest VM: # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions \ -e sched:sched_wake_idle_without_ipi perf bench sched pipe -l 1000000 --cpu 4 # without polling in idle Performance counter stats for 'CPU(s) 4,5' (5 runs): 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% ) 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% ) 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% ) 0 sched:sched_wake_idle_without_ipi # 0.000 /sec 12.615 +- 0.977 seconds time elapsed ( +- 7.75% ) # polling in idle (with haltpoll): Performance counter stats for 'CPU(s) 4,5' (5 runs): 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% ) 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% ) 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% ) 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% ) 7.566 +- 0.756 seconds time elapsed ( +- 10.00% ) Comparing the two cases, there's a significant drop in both cycles and instructions executed. And a signficant drop in the wakeup latency. Tomohiro Misono and Haris Okanovic also report similar latency improvements on Grace and Graviton systems (for v7) [1] [2]. Haris also tested a modified v9 on top of the split out barrier primitives. Lifeng also reports improved context switch latency on a bare-metal machine with acpi-idle [3]. Series layout == - patches 1-3, "cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait()" "cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL" "Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig" switch poll_idle() to using the new barrier interface. Also, do some munging of related kconfig options. - patches 4-5, "arm64: define TIF_POLLING_NRFLAG" "arm64: add support for poll_idle()" add arm64 support for the polling flag and enable poll_idle() support. - patches 6, 7-11, "ACPI: processor_idle: Support polling state for LPI" "cpuidle-haltpoll: define arch_haltpoll_want()" "governors/haltpoll: drop kvm_para_available() check" "cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL" "arm64: idle: export arch_cpu_idle()" "arm64: support cpuidle-haltpoll" add support for polling via acpi-idle, and cpuidle-haltpoll. Changelog == v10: respin of v9 - sent out smp_cond_load_relaxed_timeout() separately [0] - Dropped from this series: "asm-generic: add barrier smp_cond_load_relaxed_timeout()" "arm64: barrier: add support for smp_cond_relaxed_timeout()" "arm64/delay: move some constants out to a separate header" "arm64: support WFET in smp_cond_relaxed_timeout()" - reworded some commit messages v9: - reworked the series to address a comment from Catalin Marinas about how v8 was abusing semantics of smp_cond_load_relaxed(). - add poll_idle() support in acpi-idle (Lifeng Zheng) - dropped some earlier "Tested-by", "Reviewed-by" due to the above rework. v8: No logic changes. Largely respin of v7, with changes noted below: - move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its own patch. (patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL") - address comments simplifying arm64 support (Will Deacon) (patch-11 "arm64: support cpuidle-haltpoll") v7: No significant logic changes. Mostly a respin of v6. - minor cleanup in poll_idle() (Christoph Lameter) - fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c (Tomohiro Misono) v6: - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL changes together (comment from Christoph Lameter) - threshes out the commit messages a bit more (comments from Christoph Lameter, Sudeep Holla) - also rework selection of cpuidle-haltpoll. Now selected based on the architectural selection of ARCH_CPUIDLE_HALTPOLL. - moved back to arch_haltpoll_want() (comment from Joao Martins) Also, arch_haltpoll_want() now takes the force parameter and is now responsible for the complete selection (or not) of haltpoll. - fixes the build breakage on i386 - fixes the cpuidle-haltpoll module breakage on arm64 (comment from Tomohiro Misono, Haris Okanovic) v5: - rework the poll_idle() loop around smp_cond_load_relaxed() (review comment from Tomohiro Misono.) - also rework selection of cpuidle-haltpoll. Now selected based on the architectural selection of ARCH_CPUIDLE_HALTPOLL. - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on arm64 now depends on the event-stream being enabled. - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic) - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL. v4 changes from v3: - change 7/8 per Rafael input: drop the parens and use ret for the final check - add 8/8 which renames the guard for building poll_state v3 changes from v2: - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig - add Ack-by from Rafael Wysocki on 2/7 v2 changes from v1: - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ (this improves by 50% at least the CPU cycles consumed in the tests above: 10,716,881,137 now vs 14,503,014,257 before) - removed the ifdef from patch 1 per RafaelW Would appreciate any review comments. Ankur [0] https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@xxxxxxxxxx/ [1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@xxxxxxxxxx/ [3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@xxxxxxxxxx/ Ankur Arora (6): cpuidle/poll_state: poll via smp_cond_load_relaxed_timewait() cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL arm64: add support for poll_idle() cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL arm64: idle: export arch_cpu_idle() arm64: support cpuidle-haltpoll Joao Martins (4): Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig arm64: define TIF_POLLING_NRFLAG cpuidle-haltpoll: define arch_haltpoll_want() governors/haltpoll: drop kvm_para_available() check Lifeng Zheng (1): ACPI: processor_idle: Support polling state for LPI arch/Kconfig | 3 ++ arch/arm64/Kconfig | 7 ++++ arch/arm64/include/asm/cpuidle_haltpoll.h | 20 +++++++++++ arch/arm64/include/asm/thread_info.h | 2 ++ arch/arm64/kernel/idle.c | 1 + arch/x86/Kconfig | 5 ++- arch/x86/include/asm/cpuidle_haltpoll.h | 1 + arch/x86/kernel/kvm.c | 13 +++++++ drivers/acpi/processor_idle.c | 43 +++++++++++++++++++---- drivers/cpuidle/Kconfig | 5 ++- drivers/cpuidle/Makefile | 2 +- drivers/cpuidle/cpuidle-haltpoll.c | 12 +------ drivers/cpuidle/governors/haltpoll.c | 6 +--- drivers/cpuidle/poll_state.c | 27 +++++--------- drivers/idle/Kconfig | 1 + include/linux/cpuidle.h | 2 +- include/linux/cpuidle_haltpoll.h | 5 +++ 17 files changed, 105 insertions(+), 50 deletions(-) create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h -- 2.43.5