Re: [PATCH v4 0/4] KVM: lapic: Fix a variety of timer adv issues

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Wed, 8 May 2019 14:43:43 -0700

On Sun, May 05, 2019 at 08:43:24AM +0800, Wanpeng Li wrote:
> On Wed, 1 May 2019 at 03:31, Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > On Sun, Apr 28, 2019 at 08:54:30AM +0800, Wanpeng Li wrote:
> > > Hi Sean,
> > > On Thu, 18 Apr 2019 at 01:18, Sean Christopherson
> > > <sean.j.christopherson@xxxxxxxxx> wrote:
> > > >
> > > > KVM's recently introduced adaptive tuning of lapic_timer_advance_ns has
> > > > several critical flaws:
> > > [.../...]
> > > >
> > > >   - TSC scaling is done on a per-vCPU basis, while the advancement value
> > > >     is global.  This issue is also present without adaptive tuning, but
> > > >     is now more pronounced.
> > >
> > > Did you test this against overcommit scenario? Your per-vCPU variable
> > > can be a large number(yeah, below your 5000ns) when neighbour VMs on
> > > the same host consume cpu heavily, however, kvm will wast a lot of
> > > time to wait when the neighbour VMs are idle. My original patch
> > > evaluate the conservative hypervisor overhead when the first VM is
> > > deployed on the host. It doesn't matter whether or not the VMs on this
> > > host alter their workload behaviors later. Unless you tune the
> > > per-vCPU variable always, however, I think it will introduce more
> > > overhead. So Liran's patch "Consider LAPIC TSC-Deadline Timer expired
> > > if deadline too short" also can't depend on this.
> >
> > I didn't test it in overcommit scenarios.  I wasn't aware of how the
> 
> I think it should be considered.
> 
> > automatic adjustments were being used in real deployments.
> >
> > The best option I can think of is to expose a vCPU's advance time to
> > userspace (not sure what mechanism would be best).  This would allow
> > userspace to run a single vCPU VM with auto-tuning enabled, snapshot
> > the final adjusted advancment, and then update KVM's parameter to set
> > an explicit advancement and effectively disable auto-tuning.
> 
> This step is too complex to deploy in real environment, the same as
> w/o auto-tuning. My auto-tuning patch evaluates the conservative
> hypervisor overhead when the first VM is deployed on the host, and
> auto-tuning it only once for the whole machine.

But even then the advancement could be corrupted or wildly inaccurate
unless that first VM has a single vCPU.

I thought of an idea that will hopefully fix the overcommit scenario and
in general reduce the time spent auto-adjusting.  Patch incoming...