Re: [PATCH 1/2] KVM: x86/xen: PV oneshot timer fixes

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Wed, 9 Mar 2022 16:39:58 +0100

On 3/9/22 15:38, David Woodhouse wrote:
+		if (hrtimer_active(&vcpu->arch.xen.timer))
+			data->u.timer.expires_ns = vcpu->arch.xen.timer_expires;
+		else
+			data->u.timer.expires_ns = 0;
  		r = 0;

This is still racy.  You should instead clear timer_expires in 
xen_timer_callback, and only do so after a *successful* write to the 
event channel (setting timer_pending is not enough).

Still, this only works if userspace reads the pending events *after* 
expires_ns.  It's the usual pattern:

	xen_timer_callback()		userspace
	------------------------	------------------------
	kvm_xen_set_evtchn_fast()	read timer_expires
	smp_wmb()			smp_rmb()
	expires_ns = 0;			read event channel

Now, if expires is read as 0, userspace surely will read the event properly.

Is this doable with the KVM ioctls that you have, considering that the 
pending events are in memory?  If not, you should add pause timer and 
resume timer ioctls.  These will change the hrtimer state without 
touching timer_expires and timer_pending.

But even better, and even simpler: just set timer_pending in 
xen_timer_callback, always go through the delayed injection path, and 
remove this "if" from KVM_XEN_VCPU_GET_ATTR.  Then there are no races, 
because KVM_RUN and KVM_XEN_VCPU_GET_ATTR are protected with vcpu->mutex:

	xen_timer_callback()		migration source
	------------------------	------------------------
					read event channel
	inc timer_pending
					ioctl(KVM_XEN_VCPU_GET_ATTR)
					  read timer_expires

timer_expires is *not* read as zero, because the clearing happens only 
in kvm_xen_inject_timer_irqs(), which cannot be concurrent with 
KVM_XEN_VCPU_GET_ATTR.  And the destination does:

	migration destination			xen_timer_callback()
	------------------------		------------------------
	ioctl(KVM_XEN_VCPU_SET_ATTR)
	  set timer_expires
	    hrtimer_start()
					        inc timer_pending
	ioctl(KVM_RUN)
	  kvm_xen_inject_timer_irqs()
	    kvm_xen_set_evtchn()
	      kvm_xen_set_evtchn_fast()
	    expires_ns = 0;

Paolo