On Tue, May 14, 2019 at 06:56:04PM +0800, Wanpeng Li wrote: > On Tue, 14 May 2019 at 09:45, Wanpeng Li <kernellwp@xxxxxxxxx> wrote: > > > > On Tue, 14 May 2019 at 03:54, Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > Rather than reinvent the wheel, can we simply move the call to > > > wait_lapic_expire() into vmx.c and svm.c? For VMX we'd probably want to > > > support the advancement if enable_unrestricted_guest=true so that we avoid > > > the emulation_required case, but other than that I don't see anything that > > > requires wait_lapic_expire() to be called where it is. > > > > I also considered to move wait_lapic_expire() into vmx.c and svm.c > > before, what do you think, Paolo, Radim? > > However, guest_enter_irqoff() also prevents this. Otherwise, we will > account busy wait time as guest time. How about sampling several times > and get the average value or conservative min value to handle Sean's > concern? Hmm, looking at the history, wait_lapic_expire() was originally called immediately before kvm_x86_ops->run()[1]. The call was moved above guest_enter_irqoff() because of its tracepoint, which violated the RCU extended quiescent state invoked by guest_enter_irqoff()[2][3]. In other words, I don't think there is a fundamental issue with accounting the busy wait time to the guest rather than the host. Assuming the tracepoint was added to help tune the advancement time, I think we can simply remove the tracepoint, which would allow moving wait_lapic_expire(). Now that the advancement time is tracked per-vCPU, realizing a change in the advancement time requires creating a new VM. For all intents and purposes this makes it impractical to hand tune the advancement in real time using the tracepoint as the feedback mechanism. If we want to expose the per-vCPU advancement time to the user, a debugfs entry is likely sufficient given that the advancement time is automatically adjusted. [1] Commit d0659d946be0 ("KVM: x86: add option to advance tscdeadline hrtimer expiration") [2] Commit 8b89fe1f6c43 ("kvm: x86: move tracepoints outside extended quiescent state") [3] https://patchwork.kernel.org/patch/7821111/