Re: [PATCH v6 0/6] x86/xen: Add in-kernel Xen event channel delivery

Peter Xu <peterx@xxxxxxxxxx> · Wed, 12 Jan 2022 11:10:12 +0800

On Sat, Jan 08, 2022 at 12:26:32AM +0000, Sean Christopherson wrote:
> +Peter
> 
> On Wed, Jan 05, 2022, David Woodhouse wrote:
> > On Thu, 2021-12-23 at 21:24 +0000, Sean Christopherson wrote:
> > > Commit e880c6ea55b9 ("KVM: x86: hyper-v: Prevent using not-yet-updated TSC page
> > > by secondary CPUs") is squarely to blame as it was added after dirty ring, though
> > > in Vitaly's defense, David put it best: "That's a fairly awful bear trap".
> > 
> > Even with the WARN to keep us honest, this is still awful.
> > 
> > We have kvm_vcpu_write_guest()... but the vcpu we pass it is ignored
> > and only vcpu->kvm is used. But you have to be running on a physical
> > CPU which currently owns *a* vCPU of that KVM, or you might crash.
> > 
> > There is also kvm_write_guest() which doesn't take a vCPU as an
> > argument, and looks for all the world like it was designed for you not
> > to need one... but which still needs there to be a vCPU or it might
> > crash.

Right, I agree too those are slightly confusing.

> > 
> > I think I want to kill the latter, make the former use the vCPU it's
> > given, add a spinlock to the dirty ring which will be uncontended
> > anyway in the common case so it shouldn't hurt (?),
> 
> IIRC, Peter did a fair amount of performance testing and analysis that led to
> the current behavior.

Yes that comes out from the discussion when working on dirty ring.  I can't
remember the details too, but IIRC what we figured is 99% cases of guest
dirtying pages are with vcpu context.

The only outlier at that time (at least for x86) was kvm-gt who uses direct kvm
context but then it turns out that's an illegal use of the kvm API and kvm-gt
fixed itself instead.

Then we come to the current kvm_get_running_vcpu() approach because all the
dirty track contexts are within an existing vcpu context.  Dirty ring could be
simplified too because we don't need the default vcpu0 trick, nor do we need an
extra ring just to record no-vcpu contexts (in very early version of dirty ring
we do have N+1 rings, where N is the vcpu number, and the extra is for this).

kvm_get_running_vcpu() also has a slight benefit that it guarantees no spinlock
needed.

> 
> > and then let people use kvm->vcpu[0] when they really need to, with a
> > documented caveat that when there are *no* vCPUs in a KVM, dirty tracking
> > might not work.  Which is fine, as migration of a KVM that hasn't been fully
> > set up yet is silly.
> 
> "when they really need to" can be a slippery slope, using vcpu[0] is also quite
> gross.  Though I 100% agree that always using kvm_get_running_vcpu() is awful.

I agreed none of them looks like an extreme clean approach.

So far I'd hope we can fix the problem with what Sean suggested on postponing
the page update until when we have the vcpu context, so we keep the assumption
still true on "having a vcpu context" at least for x86 and that'll be the
simplest, afaiu.

Or do we have explicit other requirement that needs to dirty guest pages
without vcpu context at all?

Thanks,

-- 
Peter Xu