Re: [PATCH v6 0/6] x86/xen: Add in-kernel Xen event channel delivery

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 19 Jan 2022 08:14:32 +0000

On Wed, 2022-01-12 at 11:10 +0800, Peter Xu wrote:
> On Sat, Jan 08, 2022 at 12:26:32AM +0000, Sean Christopherson wrote:
> > +Peter
> > 
> > On Wed, Jan 05, 2022, David Woodhouse wrote:
> > > On Thu, 2021-12-23 at 21:24 +0000, Sean Christopherson wrote:
> > > > Commit e880c6ea55b9 ("KVM: x86: hyper-v: Prevent using not-yet-updated TSC page
> > > > by secondary CPUs") is squarely to blame as it was added after dirty ring, though
> > > > in Vitaly's defense, David put it best: "That's a fairly awful bear trap".
> > > 
> > > Even with the WARN to keep us honest, this is still awful.
> > > 
> > > We have kvm_vcpu_write_guest()... but the vcpu we pass it is ignored
> > > and only vcpu->kvm is used. But you have to be running on a physical
> > > CPU which currently owns *a* vCPU of that KVM, or you might crash.
> > > 
> > > There is also kvm_write_guest() which doesn't take a vCPU as an
> > > argument, and looks for all the world like it was designed for you not
> > > to need one... but which still needs there to be a vCPU or it might
> > > crash.
> 
> Right, I agree too those are slightly confusing.
> 
> > > 
> > > I think I want to kill the latter, make the former use the vCPU it's
> > > given, add a spinlock to the dirty ring which will be uncontended
> > > anyway in the common case so it shouldn't hurt (?),
> > 
> > IIRC, Peter did a fair amount of performance testing and analysis that led to
> > the current behavior.
> 
> Yes that comes out from the discussion when working on dirty ring.  I can't
> remember the details too, but IIRC what we figured is 99% cases of guest
> dirtying pages are with vcpu context.
> 
> The only outlier at that time (at least for x86) was kvm-gt who uses direct kvm
> context but then it turns out that's an illegal use of the kvm API and kvm-gt
> fixed itself instead.
> 
> Then we come to the current kvm_get_running_vcpu() approach because all the
> dirty track contexts are within an existing vcpu context.  Dirty ring could be
> simplified too because we don't need the default vcpu0 trick, nor do we need an
> extra ring just to record no-vcpu contexts (in very early version of dirty ring
> we do have N+1 rings, where N is the vcpu number, and the extra is for this).
> 
> kvm_get_running_vcpu() also has a slight benefit that it guarantees no spinlock
> needed.
> 
> > 
> > > and then let people use kvm->vcpu[0] when they really need to, with a
> > > documented caveat that when there are *no* vCPUs in a KVM, dirty tracking
> > > might not work.  Which is fine, as migration of a KVM that hasn't been fully
> > > set up yet is silly.
> > 
> > "when they really need to" can be a slippery slope, using vcpu[0] is also quite
> > gross.  Though I 100% agree that always using kvm_get_running_vcpu() is awful.
> 
> I agreed none of them looks like an extreme clean approach.
> 
> So far I'd hope we can fix the problem with what Sean suggested on postponing
> the page update until when we have the vcpu context, so we keep the assumption
> still true on "having a vcpu context" at least for x86 and that'll be the
> simplest, afaiu.
> 
> Or do we have explicit other requirement that needs to dirty guest pages
> without vcpu context at all?

Delivering interrupts may want to do so. That's the one we hit for
S390, and I only avoided it for Xen event channel delivery on x86 by
declaring that the Xen shared info page is exempt from dirty tracking
and should *always* be considered dirty.
Attachment:
smime.p7s

Description: S/MIME cryptographic signature