Hi Sean, Thanks for taking the time to look at this. On Wed Oct 4, 2023 at 11:47 PM UTC, Sean Christopherson wrote: [...] > > @@ -3996,6 +4002,39 @@ static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) > > return 0; > > } > > > > +static __poll_t kvm_vcpu_poll(struct file *file, poll_table *wait) > > +{ > > + struct kvm_vcpu *vcpu = file->private_data; > > + > > + if (!vcpu->poll_mask) > > + return EPOLLERR; > > + > > + switch (READ_ONCE(vcpu->mode)) { > > + case OUTSIDE_GUEST_MODE: > > + /* > > + * Make sure writes to vcpu->request are visible before the > > + * mode changes. > > + */ > > Huh? There are no writes to vcpu->request anywhere in here. My thinking was the vcpu->requests load below could've been speculated ahead of vcpu->mode's store, this will miss events when first entering poll(). Since you pointed this out, I thought about it further. There is still room for a race with the code as is, we need read vcpu->requests only after poll_wait() returns, so as to make sure concurrent kvm_make_request()/kvm_vcpu_kick() either wake up poll, or are visible through the vcpu->requests check that precedes sleeping. [...] > > + WRITE_ONCE(vcpu->mode, OUTSIDE_GUEST_MODE); > > This does not look remotely safe on multiple fronts. For starters, I don't see > anything in the .poll() infrastructure that provides serialization, e.g. if there > are multiple tasks polling then this will be "interesting". Would allowing only one poller be acceptable? > And there is zero chance this is race-free, e.g. nothing prevents the vCPU task > itself from changing vcpu->mode from POLLING_FOR_EVENTS to something else. > > Why on earth is this mucking with vcpu->mode? Ignoring for the moment that using > vcpu->requests as the poll source is never going to happen, there's zero reason IIUC accessing vcpu->requests in the kvm_vcpu_poll() is out of the question? Aren't we're forced to do so in order to avoid the race I mention above. > to write vcpu->mode. From a correctness perspective, AFAICT there's no need for > any shenanigans at all, i.e. kvm_make_vcpu_request() could blindly and unconditionally > call wake_up_interruptible(). I was fixated with the halt/vtl_return use-cases, where we're either running the vCPU or polling, and it seemed a decent way to policy whether calling wake_up_interruptible() is needed. Clearly not the case, I'll get rid of all the vcpu->mode mucking. :) > I suspect what you want is a fast way to track if there *may* be pollers. Keying > off and *writing* vcpu->mode makes no sense to me. > > I think what you want is something like this, where kvm_vcpu_poll() could use > atomic_fetch_or() and atomic_fetch_andnot() to manipulate vcpu->poll_mask. > Or if we only want to support a single poller at a time, it could be a vanilla > u64. I suspect getting the poll_mask manipulation correct for multiple pollers > would be tricky, e.g. to avoid false negatives and leave a poller hanging. I'll have a go at the multiple poller approach. > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 486800a7024b..5a260fb3b248 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -259,6 +259,14 @@ static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait) > return true; > } > > +static inline bool kvm_request_is_being_polled(struct kvm_vcpu *vcpu, > + unsigned int req) > +{ > + u32 poll_mask = kvm_request_to_poll_mask(req); > + > + return (atomic_read(vcpu->poll_mask) & poll_mask) > +} > + > static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req, > struct cpumask *tmp, int current_cpu) > { > @@ -285,6 +293,9 @@ static void kvm_make_vcpu_request(struct kvm_vcpu *vcpu, unsigned int req, > if (cpu != -1 && cpu != current_cpu) > __cpumask_set_cpu(cpu, tmp); > } > + > + if (kvm_request_is_being_polled(vcpu, req)) > + wake_up_interruptible(...); > } > > bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, I'll use this approach. So since we have to provide a proper uAPI, do you have anything against having user-space set the polling mask through an ioctl? Also any suggestions on how kvm_request_to_poll_mask() should look like. For ex. VSM mostly cares for regular interrupts/timers, so mapping KVM_REQ_UNBLOCK, KVM_REQ_HV_STIMER, KVM_REQ_EVENT, KVM_REQ_SMI, KVM_REQ_NMI to a KVM_POLL_INTERRUPTS_FLAG would work. We can then have ad-hoc flags for async-pf, kvmclock updates, dirty logging, etc... Nicolas