Re: [PATCH v4 09/13] KVM: xen: automatically use the vcpu_info embedded in shared_info

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Tue, 19 Sep 2023 16:38:40 +0100

On Tue, 2023-09-19 at 15:34 +0100, Paul Durrant wrote:
> > > +       ret = kvm_gpc_activate(vi_gpc, gpa, sizeof(struct vcpu_info));
> > 
> >  From this moment, can't interrupts be delivered to the new vcpu_info,
> > even though the memcpy hasn't happened yet?
> > 
> 
> Hmm, that's a good point. TBH it would be nice to have an 'activate and 
> leave locked' primitive to avoid this.

I suppose so from the caller's point of view in this case, but I'm
somewhat disinclined to add that complexity to the pfncache code.

We take the refresh_lock *mutex* in __kvm_gpc_refresh() so it's not as
simple as declaring that said function is called with the gpc rwlock
already held.

We also do the final gpc_unmap_khva() of the old mapping after dropping
the lock; *could* we call that with a write lock held? A write lock
which is going to be taken the MM notifier callbacks? Well, maybe not
in the case of the first *activate* which isn't really a 'refresh' per
se but the whole thing is making my skin itch. I don't like it.

> > I think we need to ensure that any kvm_xen_set_evtchn_fast() which
> > happens at this point cannot proceed, and falls back to the slow path.
> > 
> > Can we set a flag before we activate the vcpu_info and clear it after
> > the memcpy is done, then make kvm_xen_set_evtchn_fast() return
> > EWOULDBLOCK whenever that flag is set?
> > 
> > The slow path in kvm_xen_set_evtchn() takes kvm->arch.xen.xen_lock and
> > I think kvm_xen_vcpu_set_attr() has taken that same lock before you get
> > to this code, so it works out nicely?
> > 
> 
> Yes, I think that is safe... but if we didn't have the window between
> activating the vcpu_info cache and doing the copy we'd also be ok I 
> think... Or perhaps we could simply preserve evtchn_pending_sel and copy 
> the rest of it?

> 
I suppose you could just write the evtchn_pending_sel word in the new
vcpu_info GPA to zero before setting up the pfncache for it.

When when you do the memcpy, you don't *just* memcpy the
evtchn_pending_sel word; you use the bitwise OR of the old and new, so
you catch any bits which got set in the new word in the interim?

But then again, who moves the vcpu_info while there are actually
interrupts in-flight to the vCPU in question? Maybe we just declare
that we don't care, and that interrupts may be lost in that case? Even
if *Xen* wouldn't have lost them (and I don't even know that part is
true).

> > This adds a new lock ordering rule of the vcpu_info lock(s) before the
> > shared_info lock. I don't know that it's *wrong* but it seems weird to
> > me; I expected the shared_info to come first?
> > 
> > I avoided taking both at once in kvm_xen_set_evtchn_fast(), although
> > maybe if we are going to have a rule that allows both, we could revisit
> > that. Suspect it isn't needed.
> > 
> > Either way it is worth a clear comment somewhere to document the lock
> > ordering, and I'd also like to know this has been tested with lockdep,
> > which is often cleverer than me.
> > 
> 
> Ok. I agree that shared_info before vcpu_info does seem more intuitive 
> and maybe it would be better given the code in 
> kvm_xen_set_evtchn_fast(). I'll seem how messy it gets in re-ordering
> and add a comment as you suggest.
> 

I think they look interchangeable in this case. If we *do* take them
both in kvm_xen_set_evtchn_fast() then maybe we can simplify the slow
path where it set the bits in shared_info but then the vcpu_info gpc
was invalid. That currently uses a kvm->arch.xen.evtchn_pending_sel
shadow of the bits, and just kicks the vCPU to deliver them for
itself... but maybe that whole thing could be dropped, and
kvm_xen_set_evtchn_fast() can just return EWOULDBLOCK if it fails to
lock *both* shared_info and vcpu_info at the same time?

I didn't do that before, because I didn't want to introduce lock
ordering rules. But I'm happier to do so now. And I think we can ditch
a lot of hairy asm in kvm_xen_inject_pending_events() ?

Attachment:
smime.p7s

Description: S/MIME cryptographic signature