Am 10.09.2015 um 03:55 schrieb Wanpeng Li: > On 9/9/15 9:39 PM, Christian Borntraeger wrote: >> Am 03.09.2015 um 16:07 schrieb Wanpeng Li: >>> v6 -> v7: >>> * explicit signal (set a bool) >>> * fix the tracepoint >>> >>> v5 -> v6: >>> * fix wait_ns and poll_ns >>> >>> v4 -> v5: >>> * set base case 10us and max poll time 500us >>> * handle short/long halt, idea from David, many thanks David >>> >>> v3 -> v4: >>> * bring back grow vcpu->halt_poll_ns when interrupt arrives and shrinks >>> when idle VCPU is detected >>> >>> v2 -> v3: >>> * grow/shrink vcpu->halt_poll_ns by *halt_poll_ns_grow or /halt_poll_ns_shrink >>> * drop the macros and hard coding the numbers in the param definitions >>> * update the comments "5-7 us" >>> * remove halt_poll_ns_max and use halt_poll_ns as the max halt_poll_ns time, >>> vcpu->halt_poll_ns start at zero >>> * drop the wrappers >>> * move the grow/shrink logic before "out:" w/ "if (waited)" >>> >>> v1 -> v2: >>> * change kvm_vcpu_block to read halt_poll_ns from the vcpu instead of >>> the module parameter >>> * use the shrink/grow matrix which is suggested by David >>> * set halt_poll_ns_max to 2ms >>> >>> There is a downside of always-poll since poll is still happened for idle >>> vCPUs which can waste cpu usage. This patchset add the ability to adjust >>> halt_poll_ns dynamically, to grow halt_poll_ns when shot halt is detected, >>> and to shrink halt_poll_ns when long halt is detected. >>> >>> There are two new kernel parameters for changing the halt_poll_ns: >>> halt_poll_ns_grow and halt_poll_ns_shrink. >>> >>> no-poll always-poll dynamic-poll >>> ----------------------------------------------------------------------- >>> Idle (nohz) vCPU %c0 0.15% 0.3% 0.2% >>> Idle (250HZ) vCPU %c0 1.1% 4.6%~14% 1.2% >>> TCP_RR latency 34us 27us 26.7us >>> >>> "Idle (X) vCPU %c0" is the percent of time the physical cpu spent in >>> c0 over 60 seconds (each vCPU is pinned to a pCPU). (nohz) means the >>> guest was tickless. (250HZ) means the guest was ticking at 250HZ. >>> >>> The big win is with ticking operating systems. Running the linux guest >>> with nohz=off (and HZ=250), we save 3.4%~12.8% CPUs/second and get close >>> to no-polling overhead levels by using the dynamic-poll. The savings >>> should be even higher for higher frequency ticks. >>> >>> Wanpeng Li (3): >>> KVM: make halt_poll_ns per-vCPU >>> KVM: dynamic halt-polling >>> KVM: trace kvm_halt_poll_ns grow/shrink >>> >>> include/linux/kvm_host.h | 1 + >>> include/trace/events/kvm.h | 30 +++++++++++++++++++ >>> virt/kvm/kvm_main.c | 72 ++++++++++++++++++++++++++++++++++++++++++---- >>> 3 files changed, 97 insertions(+), 6 deletions(-) >>> >> I get some nice improvements for uperf between 2 guests, > > Good to hear that. > >> but there is one "bug": >> If there is already some polling ongoing, its impossible to disable the polling, > > The polling will stop if long halt is detected, and there is no need to manual tuning. Just like dynamise PLE window can detect false positive and handle ple window suitably. Yes, but as soon as somebody sets halt_poll_ns to 0, polling will never stop, as grow and shrink are only handled if halt_poll_ns is !=0. [...] if (halt_poll_ns) { if (block_ns <= vcpu->halt_poll_ns) ; /* we had a long block, shrink polling */ else if (vcpu->halt_poll_ns && block_ns > halt_poll_ns) shrink_halt_poll_ns(vcpu); /* we had a short halt and our poll time is too small */ else if (vcpu->halt_poll_ns < halt_poll_ns && block_ns < halt_poll_ns) grow_halt_poll_ns(vcpu); } [...] so maybe just do something like diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4662a88..48828d6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2012,6 +2012,8 @@ out: else if (vcpu->halt_poll_ns < halt_poll_ns && block_ns < halt_poll_ns) grow_halt_poll_ns(vcpu); + } else { + vcpu->halt_poll_ns = 0; } trace_kvm_vcpu_wakeup(block_ns, waited); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html