Re: [PATCH 1/2] KVM: x86/xen: PV oneshot timer fixes

David Woodhouse <dwmw2@xxxxxxxxxxxxx> · Wed, 09 Mar 2022 17:11:41 +0100

On Wed, 2022-03-09 at 16:39 +0100, Paolo Bonzini wrote:
> On 3/9/22 15:38, David Woodhouse wrote:
> > +		if (hrtimer_active(&vcpu->arch.xen.timer))
> > +			data->u.timer.expires_ns = vcpu->arch.xen.timer_expires;
> > +		else
> > +			data->u.timer.expires_ns = 0;
> >   		r = 0;
> 
> This is still racy.  You should instead clear timer_expires in 
> xen_timer_callback, and only do so after a *successful* write to the 
> event channel (setting timer_pending is not enough).

Hm, I thought about setting expires_ns = 0 in xen_timer_callback() but
that can race with the vCPU making a hypercall to reconfigure the
timer.

> Still, this only works if userspace reads the pending events *after* 
> expires_ns.  It's the usual pattern:
> 
> 	xen_timer_callback()		userspace
> 	------------------------	------------------------
> 	kvm_xen_set_evtchn_fast()	read timer_expires
> 	smp_wmb()			smp_rmb()
> 	expires_ns = 0;			read event channel
> 
> Now, if expires is read as 0, userspace surely will read the event properly.
> 
> Is this doable with the KVM ioctls that you have, considering that the 
> pending events are in memory?  If not, you should add pause timer and 
> resume timer ioctls.  These will change the hrtimer state without 
> touching timer_expires and timer_pending.

That's OK, as the pending events are in the shared_info and vcpu_info
regions which we have explicitly declared exempt from dirty tracking.
Userspace must always consider them dirty any time an interrupt (which
includes timers) might be delivered, so it has to migrate that memory
in the final sync, after serializing the vCPU state.

In the local APIC delivery mode, the actual interrupt is injected as an
MSI, so userspace needs to read the timer state before the local APIC
state. 

> But even better, and even simpler: just set timer_pending in 
> xen_timer_callback, always go through the delayed injection path, and 
> remove this "if" from KVM_XEN_VCPU_GET_ATTR.  Then there are no races, 
> because KVM_RUN and KVM_XEN_VCPU_GET_ATTR are protected with vcpu->mutex:

Yeah, I don't think it even increases the latency as the vCPU had
already exited guest mode anyway. I'll do that, retest it and repost.

Thanks.

Attachment:
smime.p7s

Description: S/MIME cryptographic signature