Some latency-intensive workload will see obviously performance drop when running inside VM. The main reason is that the overhead is amplified when running inside VM. The most cost i have seen is inside idle path. This patch introduces a new mechanism to poll for a while before entering idle state. If schedule is needed during poll, then we don't need to goes through the heavy overhead path. Here is the data we get when running benchmark contextswitch to measure the latency(lower is better): 1. w/o patch: 2493.14 ns/ctxsw -- 200.3 %CPU 2. w/ patch: halt_poll_threshold=10000 -- 1485.96ns/ctxsw -- 201.0 %CPU halt_poll_threshold=20000 -- 1391.26 ns/ctxsw -- 200.7 %CPU halt_poll_threshold=30000 -- 1488.55 ns/ctxsw -- 200.1 %CPU halt_poll_threshold=500000 -- 1159.14 ns/ctxsw -- 201.5 %CPU 3. kvm dynamic poll halt_poll_ns=10000 -- 2296.11 ns/ctxsw -- 201.2 %CPU halt_poll_ns=20000 -- 2599.7 ns/ctxsw -- 201.7 %CPU halt_poll_ns=30000 -- 2588.68 ns/ctxsw -- 211.6 %CPU halt_poll_ns=500000 -- 2423.20 ns/ctxsw -- 229.2 %CPU 4. idle=poll 2050.1 ns/ctxsw -- 1003 %CPU 5. idle=mwait 2188.06 ns/ctxsw -- 206.3 %CPU Here is the data we get when running benchmark netperf: 1. w/o patch: 14556.8 bits/s -- 144.2 %CPU 2. w/ patch: halt_poll_threshold=10000 -- 15803.89 bits/s -- 159.5 %CPU halt_poll_threshold=20000 -- 15899.04 bits/s -- 161.5 %CPU halt_poll_threshold=30000 -- 15642.38 bits/s -- 161.8 %CPU halt_poll_threshold=40000 -- 18040.76 bits/s -- 184.0 %CPU halt_poll_threshold=50000 -- 18877.61 bits/s -- 197.3 %CPU 3. kvm dynamic poll halt_poll_ns=10000 -- 15876.00 bits/s -- 172.2 %CPU halt_poll_ns=20000 -- 15602.58 bits/s -- 185.4 %CPU halt_poll_ns=30000 -- 15930.69 bits/s -- 194.4 %CPU halt_poll_ns=40000 -- 16413.09 bits/s -- 195.3 %CPU halt_poll_ns=50000 -- 16417.42 bits/s -- 196.3 %CPU 4. idle=poll in guest 18441.3bit/s -- 1003 %CPU 5. idle=mwait in guest 15760.6 bits/s -- 157.6 %CPU V1 -> V2: - integrate the smart halt poll into paravirt code - use idle_stamp instead of check_poll - since it hard to get whether vcpu is the only task in pcpu, so we don't consider it in this series.(May improve it in future) Yang Zhang (7): x86/paravirt: Add pv_idle_ops to paravirt ops KVM guest: register kvm_idle_poll for pv_idle_ops sched/idle: Add poll before enter real idle path x86/paravirt: Add update in x86/paravirt pv_idle_ops Documentation: Add three sysctls for smart idle poll KVM guest: introduce smart idle poll algorithm sched/idle: update poll time when wakeup from idle Documentation/sysctl/kernel.txt | 25 +++++++++++++ arch/x86/include/asm/paravirt.h | 10 ++++++ arch/x86/include/asm/paravirt_types.h | 7 ++++ arch/x86/kernel/kvm.c | 67 +++++++++++++++++++++++++++++++++++ arch/x86/kernel/paravirt.c | 11 ++++++ arch/x86/kernel/process.c | 7 ++++ include/linux/kernel.h | 6 ++++ include/linux/sched/idle.h | 4 +++ kernel/sched/core.c | 4 +++ kernel/sched/idle.c | 9 +++++ kernel/sysctl.c | 23 ++++++++++++ 11 files changed, 173 insertions(+) -- 1.8.3.1