Re: [RFC PATCH 4/5] Utilize the vmx preemption timer for tsc deadline timer

Wanpeng Li <kernellwp@xxxxxxxxx> · Tue, 24 May 2016 09:32:19 +0800

2016-05-24 9:20 GMT+08:00 yunhong jiang <yunhong.jiang@xxxxxxxxxxxxxxx>:
> On Tue, 24 May 2016 09:16:03 +0800
> Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>
>> 2016-05-24 8:55 GMT+08:00 yunhong jiang
>> <yunhong.jiang@xxxxxxxxxxxxxxx>:
>> > On Tue, 24 May 2016 08:53:14 +0800
>> > Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>> >
>> >> 2016-05-24 6:58 GMT+08:00 yunhong jiang
>> >> <yunhong.jiang@xxxxxxxxxxxxxxx>:
>> >> > On Sun, 22 May 2016 08:21:50 +0800
>> >> > Wanpeng Li <kernellwp@xxxxxxxxx> wrote:
>> >> >
>> >> >> 2016-05-21 6:06 GMT+08:00 Jiang, Yunhong
>> >> >> <yunhong.jiang@xxxxxxxxx>:
>> >> >> >
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: kvm-owner@xxxxxxxxxxxxxxx
>> >> >> >> [mailto:kvm-owner@xxxxxxxxxxxxxxx] On Behalf Of Paolo Bonzini
>> >> >> >> Sent: Friday, May 20, 2016 3:34 AM
>> >> >> >> To: Yunhong Jiang <yunhong.jiang@xxxxxxxxxxxxxxx>;
>> >> >> >> kvm@xxxxxxxxxxxxxxx Cc: rkrcmar@xxxxxxxxxx
>> >> >> >> Subject: Re: [RFC PATCH 4/5] Utilize the vmx preemption timer
>> >> >> >> for tsc deadline timer
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> On 20/05/2016 03:45, Yunhong Jiang wrote:
>> >> >> >> > From: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
>> >> >> >> >
>> >> >> >> > Utilizing the VMX preemption timer for tsc deadline timer
>> >> >> >> > virtualization. The VMX preemption timer is armed when the
>> >> >> >> > vCPU is running, and a VMExit will happen if the virtual
>> >> >> >> > TSC deadline timer expires.
>> >> >> >> >
>> >> >> >> > When the vCPU thread is scheduled out, the tsc deadline
>> >> >> >> > timer virtualization will be switched to use the current
>> >> >> >> > solution, i.e. use the timer for it. It's switched back to
>> >> >> >> > VMX preemption timer when the vCPU thread is scheduled int.
>> >> >> >> >
>> >> >> >> > This solution avoids the complex OS's hrtimer system, and
>> >> >> >> > also the host timer interrupt handling cost, with a
>> >> >> >> > preemption_timer VMexit. It fits well for some NFV usage
>> >> >> >> > scenario, when the vCPU is bound to a pCPU and the pCPU is
>> >> >> >> > isolated, or some similar scenario.
>> >> >> >> >
>> >> >> >> > However, it possibly has impact if the vCPU thread is
>> >> >> >> > scheduled in/out very frequently, because it switches
>> >> >> >> > from/to the hrtimer emulation a lot.
>> >> >> >> >
>> >> >> >> > Signed-off-by: Yunhong Jiang <yunhong.jiang@xxxxxxxxx>
>> >> >> >> > ---
>> >> >> >> >  arch/x86/kvm/lapic.c | 108
>> >> >> >> +++++++++++++++++++++++++++++++++++++++++++++++++--
>> >> >> >> >  arch/x86/kvm/lapic.h |  10 +++++
>> >> >> >> >  arch/x86/kvm/vmx.c   |  26 +++++++++++++
>> >> >> >> >  arch/x86/kvm/x86.c   |   6 +++
>> >> >> >> >  4 files changed, 147 insertions(+), 3 deletions(-)
>> >> >> >> >
>> >> >> >> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> >> >> >> > index 5776473be362..a613bcfda59a 100644
>> >> >> >> > --- a/arch/x86/kvm/x86.c
>> >> >> >> > +++ b/arch/x86/kvm/x86.c
>> >> >> >> > @@ -6608,6 +6608,8 @@ static int vcpu_enter_guest(struct
>> >> >> >> > kvm_vcpu
>> >> >> >> *vcpu)
>> >> >> >> >
>> >> >> >> >     local_irq_disable();
>> >> >> >> >
>> >> >> >> > +   inject_expired_hwemul_timer(vcpu);
>> >> >> >>
>> >> >> >> Is this really fast enough (and does it trigger often enough)
>> >> >> >> that it is worth slowing down all vmenters?
>> >> >> >>
>> >> >> >> I'd rather call inject_expired_hwemul_timer from the
>> >> >> >> preemption timer vmexit handler instead.
>> >> >> >> inject_pending_hwemul_timer will set the preemption timer
>> >> >> >> countdown to zero if the deadline of the guest LAPIC timer
>> >> >> >> has passed already.  This should be relatively rare.
>> >> >> >
>> >> >> > Sure and will take this way on the new patch set. I'd give
>> >> >> > some reson why it's this way now. Originally this patch was
>> >> >> > for cyclictest on guest with latency less than 15us for 24
>> >> >> > hours. So, if the timer expires already before VM entry, we
>> >> >> > try to inject it immediately, instead of waiting for an extra
>> >> >> > VMExit, which may be 4~5 us.
>> >> >>
>> >> >> inject_expired_hwemul_timer() just set the pending bit, and
>> >> >> still need a vmexit to final exit to vcpu_run() which is the
>> >> >> only place to check pending and inject APIC_LVTT, so why add
>> >> >> inject_expired_hwemul_timer() in vcpu_enter_guest() can avoid an
>> >> >> extra vmexit?
>> >> >
>> >> > The inject_expired_hwemul_timer() will invoke the
>> >> > kvm_make_request(), which will cause the kvm try to inject the
>> >> > timer interrupt directly. Please notice the vcpu_enter_guest()
>> >> > will recheck the requests late.
>> >>
>> >> Actually I did't find another place inject pending timer irqs
>> >> except in vcpu_run, though kvm_set_pending_timer mention that
>> >> there is implicitly checked in vcpu_enter_guest.
>> >
>> > Hi, Wanpeng, thanks for the check. Please have a look on the
>> > changes to the arch/x86/kvm/x86.c on the patch, the
>> > inject_expired_hwemul_timer() is called twice there.
>>
>> Yes.
>>
>> >
>> > Of course, per Paolo's review, this code path will be removed on
>> > the next submission.
>>
>> So my question still is why there is no timer irqs inject in
>> vcpu_enter_guest, though kvm_set_pending_timer mention that there is
>> implicitly checked in vcpu_enter_guest.
>
> Do you mean the code at
> http://lxr.free-electrons.com/source/arch/x86/kvm/x86.c#L6607 ? It will check
> if ther are any event and if yes, it will exit.

I see, thanks. ;-)

Regards,
Wanpeng Li
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html