On 3/9/22 15:38, David Woodhouse wrote:
+ if (hrtimer_active(&vcpu->arch.xen.timer))
+ data->u.timer.expires_ns = vcpu->arch.xen.timer_expires;
+ else
+ data->u.timer.expires_ns = 0;
r = 0;
This is still racy. You should instead clear timer_expires in
xen_timer_callback, and only do so after a *successful* write to the
event channel (setting timer_pending is not enough).
Still, this only works if userspace reads the pending events *after*
expires_ns. It's the usual pattern:
xen_timer_callback() userspace
------------------------ ------------------------
kvm_xen_set_evtchn_fast() read timer_expires
smp_wmb() smp_rmb()
expires_ns = 0; read event channel
Now, if expires is read as 0, userspace surely will read the event properly.
Is this doable with the KVM ioctls that you have, considering that the
pending events are in memory? If not, you should add pause timer and
resume timer ioctls. These will change the hrtimer state without
touching timer_expires and timer_pending.
But even better, and even simpler: just set timer_pending in
xen_timer_callback, always go through the delayed injection path, and
remove this "if" from KVM_XEN_VCPU_GET_ATTR. Then there are no races,
because KVM_RUN and KVM_XEN_VCPU_GET_ATTR are protected with vcpu->mutex:
xen_timer_callback() migration source
------------------------ ------------------------
read event channel
inc timer_pending
ioctl(KVM_XEN_VCPU_GET_ATTR)
read timer_expires
timer_expires is *not* read as zero, because the clearing happens only
in kvm_xen_inject_timer_irqs(), which cannot be concurrent with
KVM_XEN_VCPU_GET_ATTR. And the destination does:
migration destination xen_timer_callback()
------------------------ ------------------------
ioctl(KVM_XEN_VCPU_SET_ATTR)
set timer_expires
hrtimer_start()
inc timer_pending
ioctl(KVM_RUN)
kvm_xen_inject_timer_irqs()
kvm_xen_set_evtchn()
kvm_xen_set_evtchn_fast()
expires_ns = 0;
Paolo