Just a ping on this to see if there are any further comments. Thanks Ankur Ankur Arora <ankur.a.arora@xxxxxxxxxx> writes: > This patchset adds support for polling in idle via poll_idle() on > arm64. > > There are two main changes in this version: > > 1. rework the series to take Catalin Marinas' comments on the semantics > of smp_cond_load_relaxed() (and how earlier versions of this > series were abusing them) into account. > > This also allows dropping of the somewhat strained connections > between haltpoll and the event-stream. > > 2. earlier versions of this series were adding support for poll_idle() > but only using it in the haltpoll driver. Add Lifeng's patch to > broaden it out by also polling in acpi-idle. > > The benefit of polling in idle is to reduce the cost of remote wakeups. > When enabled, these can be done just by setting the need-resched bit, > instead of sending an IPI, and incurring the cost of handling the > interrupt on the receiver side. When running on a VM it also saves > the cost of WFE trapping (when enabled.) > > Comparing sched-pipe performance on a guest VM: > > # perf stat -r 5 --cpu 4,5 -e task-clock,cycles,instructions,sched:sched_wake_idle_without_ipi \ > perf bench sched pipe -l 1000000 -c 4 > > # no polling in idle > > Performance counter stats for 'CPU(s) 4,5' (5 runs): > > 25,229.57 msec task-clock # 2.000 CPUs utilized ( +- 7.75% ) > 45,821,250,284 cycles # 1.816 GHz ( +- 10.07% ) > 26,557,496,665 instructions # 0.58 insn per cycle ( +- 0.21% ) > 0 sched:sched_wake_idle_without_ipi # 0.000 /sec > > 12.615 +- 0.977 seconds time elapsed ( +- 7.75% ) > > > # polling in idle (with haltpoll): > > Performance counter stats for 'CPU(s) 4,5' (5 runs): > > 15,131.58 msec task-clock # 2.000 CPUs utilized ( +- 10.00% ) > 34,158,188,839 cycles # 2.257 GHz ( +- 6.91% ) > 20,824,950,916 instructions # 0.61 insn per cycle ( +- 0.09% ) > 1,983,822 sched:sched_wake_idle_without_ipi # 131.105 K/sec ( +- 0.78% ) > > 7.566 +- 0.756 seconds time elapsed ( +- 10.00% ) > > Tomohiro Misono and Haris Okanovic also report similar latency > improvements on Grace and Graviton systems (for v7) [1] [2]. > Lifeng also reports improved context switch latency on a bare-metal > machine with acpi-idle [3]. > > The series is in four parts: > > - patches 1-4, > > "asm-generic: add barrier smp_cond_load_relaxed_timeout()" > "cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout()" > "cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL" > "Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig" > > add smp_cond_load_relaxed_timeout() and switch poll_idle() to > using it. Also, do some munging of related kconfig options. > > - patches 5-7, > > "arm64: barrier: add support for smp_cond_relaxed_timeout()" > "arm64: define TIF_POLLING_NRFLAG" > "arm64: add support for polling in idle" > > add support for the new barrier, the polling flag and enable > poll_idle() support. > > - patches 8, 9-13, > > "ACPI: processor_idle: Support polling state for LPI" > > "cpuidle-haltpoll: define arch_haltpoll_want()" > "governors/haltpoll: drop kvm_para_available() check" > "cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL" > "arm64: idle: export arch_cpu_idle" > "arm64: support cpuidle-haltpoll" > > add support for polling via acpi-idle, and cpuidle-haltpoll. > > - patches 14, 15, > "arm64/delay: move some constants out to a separate header" > "arm64: support WFET in smp_cond_relaxed_timeout()" > > are RFC patches to enable WFET support. > > Changelog: > > v9: > > - reworked the series to address a comment from Catalin Marinas > about how v8 was abusing semantics of smp_cond_load_relaxed(). > - add poll_idle() support in acpi-idle (Lifeng Zheng) > - dropped some earlier "Tested-by", "Reviewed-by" due to the > above rework. > > v8: No logic changes. Largely respin of v7, with changes > noted below: > > - move selection of ARCH_HAS_OPTIMIZED_POLL on arm64 to its > own patch. > (patch-9 "arm64: select ARCH_HAS_OPTIMIZED_POLL") > > - address comments simplifying arm64 support (Will Deacon) > (patch-11 "arm64: support cpuidle-haltpoll") > > v7: No significant logic changes. Mostly a respin of v6. > > - minor cleanup in poll_idle() (Christoph Lameter) > - fixes conflicts due to code movement in arch/arm64/kernel/cpuidle.c > (Tomohiro Misono) > > v6: > > - reordered the patches to keep poll_idle() and ARCH_HAS_OPTIMIZED_POLL > changes together (comment from Christoph Lameter) > - threshes out the commit messages a bit more (comments from Christoph > Lameter, Sudeep Holla) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - moved back to arch_haltpoll_want() (comment from Joao Martins) > Also, arch_haltpoll_want() now takes the force parameter and is > now responsible for the complete selection (or not) of haltpoll. > - fixes the build breakage on i386 > - fixes the cpuidle-haltpoll module breakage on arm64 (comment from > Tomohiro Misono, Haris Okanovic) > > v5: > - rework the poll_idle() loop around smp_cond_load_relaxed() (review > comment from Tomohiro Misono.) > - also rework selection of cpuidle-haltpoll. Now selected based > on the architectural selection of ARCH_CPUIDLE_HALTPOLL. > - arch_haltpoll_supported() (renamed from arch_haltpoll_want()) on > arm64 now depends on the event-stream being enabled. > - limit POLL_IDLE_RELAX_COUNT on arm64 (review comment from Haris Okanovic) > - ARCH_HAS_CPU_RELAX is now renamed to ARCH_HAS_OPTIMIZED_POLL. > > v4 changes from v3: > - change 7/8 per Rafael input: drop the parens and use ret for the final check > - add 8/8 which renames the guard for building poll_state > > v3 changes from v2: > - fix 1/7 per Petr Mladek - remove ARCH_HAS_CPU_RELAX from arch/x86/Kconfig > - add Ack-by from Rafael Wysocki on 2/7 > > v2 changes from v1: > - added patch 7 where we change cpu_relax with smp_cond_load_relaxed per PeterZ > (this improves by 50% at least the CPU cycles consumed in the tests above: > 10,716,881,137 now vs 14,503,014,257 before) > - removed the ifdef from patch 1 per RafaelW > > Please review. > > [1] https://lore.kernel.org/lkml/TY3PR01MB111481E9B0AF263ACC8EA5D4AE5BA2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > [2] https://lore.kernel.org/lkml/104d0ec31cb45477e27273e089402d4205ee4042.camel@xxxxxxxxxx/ > [3] https://lore.kernel.org/lkml/f8a1f85b-c4bf-4c38-81bf-728f72a4f2fe@xxxxxxxxxx/ > > Ankur Arora (10): > asm-generic: add barrier smp_cond_load_relaxed_timeout() > cpuidle/poll_state: poll via smp_cond_load_relaxed_timeout() > cpuidle: rename ARCH_HAS_CPU_RELAX to ARCH_HAS_OPTIMIZED_POLL > arm64: barrier: add support for smp_cond_relaxed_timeout() > arm64: add support for polling in idle > cpuidle-haltpoll: condition on ARCH_CPUIDLE_HALTPOLL > arm64: idle: export arch_cpu_idle > arm64: support cpuidle-haltpoll > arm64/delay: move some constants out to a separate header > arm64: support WFET in smp_cond_relaxed_timeout() > > Joao Martins (4): > Kconfig: move ARCH_HAS_OPTIMIZED_POLL to arch/Kconfig > arm64: define TIF_POLLING_NRFLAG > cpuidle-haltpoll: define arch_haltpoll_want() > governors/haltpoll: drop kvm_para_available() check > > Lifeng Zheng (1): > ACPI: processor_idle: Support polling state for LPI > > arch/Kconfig | 3 ++ > arch/arm64/Kconfig | 7 +++ > arch/arm64/include/asm/barrier.h | 62 ++++++++++++++++++++++- > arch/arm64/include/asm/cmpxchg.h | 26 ++++++---- > arch/arm64/include/asm/cpuidle_haltpoll.h | 20 ++++++++ > arch/arm64/include/asm/delay-const.h | 25 +++++++++ > arch/arm64/include/asm/thread_info.h | 2 + > arch/arm64/kernel/idle.c | 1 + > arch/arm64/lib/delay.c | 13 ++--- > arch/x86/Kconfig | 5 +- > arch/x86/include/asm/cpuidle_haltpoll.h | 1 + > arch/x86/kernel/kvm.c | 13 +++++ > drivers/acpi/processor_idle.c | 43 +++++++++++++--- > drivers/cpuidle/Kconfig | 5 +- > drivers/cpuidle/Makefile | 2 +- > drivers/cpuidle/cpuidle-haltpoll.c | 12 +---- > drivers/cpuidle/governors/haltpoll.c | 6 +-- > drivers/cpuidle/poll_state.c | 27 +++------- > drivers/idle/Kconfig | 1 + > include/asm-generic/barrier.h | 42 +++++++++++++++ > include/linux/cpuidle.h | 2 +- > include/linux/cpuidle_haltpoll.h | 5 ++ > 22 files changed, 252 insertions(+), 71 deletions(-) > create mode 100644 arch/arm64/include/asm/cpuidle_haltpoll.h > create mode 100644 arch/arm64/include/asm/delay-const.h -- ankur