Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Sat, 21 May 2016 08:38:58 -0400 (EDT)

----- Original Message -----
> From: "Yunhong Jiang" <yunhong.jiang@xxxxxxxxx>
> To: "Paolo Bonzini" <pbonzini@xxxxxxxxxx>, "Yunhong Jiang" <yunhong.jiang@xxxxxxxxxxxxxxx>, kvm@xxxxxxxxxxxxxxx
> Cc: rkrcmar@xxxxxxxxxx
> Sent: Saturday, May 21, 2016 12:06:16 AM
> Subject: RE: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer
> 
> 
> 
> > -----Original Message-----
> > From: kvm-owner@xxxxxxxxxxxxxxx [mailto:kvm-owner@xxxxxxxxxxxxxxx] On
> > Behalf Of Paolo Bonzini
> > Sent: Friday, May 20, 2016 3:34 AM
> > To: Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx>; kvm@xxxxxxxxxxxxxxx
> > Cc: rkrcmar@xxxxxxxxxx
> > Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc
> > deadline timer
> > 
> > 
> > 
> > On 20/05/2016 03:45, Yunhong Jiang wrote:
> > > From: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
> > >
> > > Utilizing the VMX preemption timer for tsc deadline timer
> > > virtualization. The VMX preemption timer is armed when the vCPU is
> > > running, and a VMExit will happen if the virtual TSC deadline timer
> > > expires.
> > >
> > > When the vCPU thread is scheduled out, the tsc deadline timer
> > > virtualization will be switched to use the current solution, i.e. use
> > > the timer for it. It's switched back to VMX preemption timer when the
> > > vCPU thread is scheduled int.
> > >
> > > This solution avoids the complex OS's hrtimer system, and also the host
> > > timer interrupt handling cost, with a preemption_timer VMexit. It fits
> > > well for some NFV usage scenario, when the vCPU is bound to a pCPU and
> > > the pCPU is isolated, or some similar scenario.
> > >
> > > However, it possibly has impact if the vCPU thread is scheduled in/out
> > > very frequently, because it switches from/to the hrtimer emulation a lot.
> > >
> > > Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
> > > ---
> > >  arch/x86/kvm/lapic.c | 108
> > +++++++++++++++++++++++++++++++++++++++++++++++++--
> > >  arch/x86/kvm/lapic.h |  10 +++++
> > >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
> > >  arch/x86/kvm/x86.c   |   6 +++
> > >  4 files changed, 147 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 5776473be362..a613bcfda59a 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct kvm_vcpu
> > *vcpu)
> > >
> > >  	local_irq_disable();
> > >
> > > +	inject_expired_hwemul_timer(vcpu);
> > 
> > Is this really fast enough (and does it trigger often enough) that it is
> > worth slowing down all vmenters?
> > 
> > I'd rather call inject_expired_hwemul_timer from the preemption timer
> > vmexit handler instead.  inject_pending_hwemul_timer will set the
> > preemption timer countdown to zero if the deadline of the guest LAPIC
> > timer has passed already.  This should be relatively rare.
> 
> Sure and will take this way on the new patch set. I'd give some reson why
> it's this way now.  Originally this patch was for cyclictest on guest
> with latency less than 15us for 24 hours.  So, if the timer expires already
> before VM entry, we try to inject it immediately, instead of waiting for
> an extra VMExit, which may be 4~5 us.

This seems too much...  A vmexit+vmentry on Ivy Bridge or newer is around
1200-1500 cycles, that should give 1-2 microseconds at most including the time
to inject the interrupt.

There are a few more ideas that I have about optimizing the preemption timer,
hopefully we can get it down to that and not pessimize the sched_out/sched_in
case.  Instead, I think what we want to touch is the blocking/unblocking
callback.  Wanpeng Li's patches to handle the APIC timer specially in
kvm_vcpu_block could help too for this.  However, there's time for that.
Please keep sched_out/sched_in in your next submission, and we can work on
it a step at a time.

Thanks,

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html