Marc Zyngier <maz@xxxxxxxxxx> writes: > On Wed, 16 Oct 2024 22:55:09 +0100, > Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote: >> >> >> Marc Zyngier <maz@xxxxxxxxxx> writes: >> >> > On Thu, 26 Sep 2024 00:24:14 +0100, >> > Ankur Arora <ankur.a.arora@xxxxxxxxxx> wrote: >> >> >> >> This patchset enables the cpuidle-haltpoll driver and its namesake >> >> governor on arm64. This is specifically interesting for KVM guests by >> >> reducing IPC latencies. >> >> >> >> Comparing idle switching latencies on an arm64 KVM guest with >> >> perf bench sched pipe: >> >> >> >> usecs/op %stdev >> >> >> >> no haltpoll (baseline) 13.48 +- 5.19% >> >> with haltpoll 6.84 +- 22.07% >> >> >> >> >> >> No change in performance for a similar test on x86: >> >> >> >> usecs/op %stdev >> >> >> >> haltpoll w/ cpu_relax() (baseline) 4.75 +- 1.76% >> >> haltpoll w/ smp_cond_load_relaxed() 4.78 +- 2.31% >> >> >> >> Both sets of tests were on otherwise idle systems with guest VCPUs >> >> pinned to specific PCPUs. One reason for the higher stdev on arm64 >> >> is that trapping of the WFE instruction by the host KVM is contingent >> >> on the number of tasks on the runqueue. >> > >> > Sorry to state the obvious, but if that's the variable trapping of >> > WFI/WFE is the cause of your trouble, why don't you simply turn it off >> > (see 0b5afe05377d for the details)? Given that you pin your vcpus to >> > physical CPUs, there is no need for any trapping. >> >> Good point. Thanks. That should help reduce the guessing games around >> the variance in these tests. > > I'd be interested to find out whether there is still some benefit in > this series once you disable the WFx trapping heuristics. The benefit of polling in idle is more than just avoiding the cost of trapping and re-entering. The other benefit is that remote wakeups can now be done just by setting need-resched, instead of sending an IPI, and incurring the cost of handling the interrupt on the receiver side. But let me get you some numbers with that. -- ankur