On Mon, Apr 23, 2018 at 07:15:58PM +0200, Paolo Bonzini wrote:
On 22/04/2018 02:53, Anthoine Bourgeois wrote:
Since the commit "8003c9ae204e: add APIC Timer periodic/oneshot mode VMX
preemption timer support", a Windows 10 guest has some erratic timer
spikes after few hours. As the uptime of the VM grows the spikes are
larger.
Here the results on a 150000 times 1ms timer without any load:
Before 8003c9ae204e | After 8003c9ae204e
Max 1834us | 86000us
Mean 1100us | 1021us
Deviation 59us | 149us
Here the results on a 150000 times 1ms timer with a cpu-z stress test:
Before 8003c9ae204e | After 8003c9ae204e
Max 32000us | 140000us
Mean 1006us | 1997us
Deviation 140us | 11095us
The current patch partially revert the previous commit by removing the
target timer expectation to go back to the straight hrtimer calls. The
APIC Timer periodic/oneshot mode support is kept because it is necessary
on the new Windows Spring update.
v2: Check if the tsc deadline is already expired. Thank you Mika.
Cc: Mika Penttilä <mika.penttila@xxxxxxxxxxxx
Signed-off-by: Anthoine Bourgeois <anthoine.bourgeois@xxxxxxxxxxxxxxx>
---
arch/x86/kvm/lapic.c | 57 +++++++++++++++++++++++++---------------------------
arch/x86/kvm/lapic.h | 1 -
2 files changed, 27 insertions(+), 31 deletions(-)
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 70dcb5548022..8b5c2a69a3b6 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1173,7 +1173,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
static u32 apic_get_tmcct(struct kvm_lapic *apic)
{
- ktime_t remaining, now;
+ ktime_t remaining;
s64 ns;
u32 tmcct;
@@ -1184,8 +1184,7 @@ static u32 apic_get_tmcct(struct kvm_lapic *apic)
apic->lapic_timer.period == 0)
return 0;
- now = ktime_get();
- remaining = ktime_sub(apic->lapic_timer.target_expiration, now);
+ remaining = hrtimer_get_remaining(&apic->lapic_timer.timer);
I'm confused, how can this work when the preemption timer is in use
(vcpu->arch.apic->lapic_timer.hv_timer_in_use is true)?
I don't really know, this hunk is only a revert that works for me.
I'm still seeking what is the root cause. My guest is the
target_expiration variable is mis compute sometimes.
What I see is the spikes are linear over time at the rate of 1ms more
every 1 minutes 30 seconds.
Anthoine