[PATCH v3 3/6] KVM: Support poll() on coalesced mmio buffer fds

Ilias Stamatis <ilstam@xxxxxxxxxx> · Tue, 20 Aug 2024 14:33:30 +0100

There is no direct way for userspace to be notified about coalesced MMIO
writes when using KVM_REGISTER_COALESCED_MMIO. If the next MMIO exit is
when the ring buffer has filled then a substantial (and unbounded)
amount of time may have passed since the first coalesced MMIO.

To improve this, make it possible for userspace to use poll() and
select() on the fd returned by the KVM_CREATE_COALESCED_MMIO_BUFFER
ioctl. This way a userspace VMM could have dedicated threads that deal
with writes to specific MMIO zones.

For example, a common use of MMIO, particularly in the realm of network
devices, is as a doorbell. A write to a doorbell register will trigger
the device to initiate a DMA transfer.

When a network device is emulated by userspace a write to a doorbell
register would typically result in an MMIO exit so that userspace can
emulate the DMA transfer in a timely manner. No further processing can
be done until userspace performs the necessary emulation and re-invokes
KVM_RUN. Even if userspace makes use of another thread to emulate the
DMA transfer such MMIO exits are disruptive to the vCPU and they may
also be quite frequent if, for example, the vCPU is sending a sequence
of short packets to the network device.

By supporting poll() on coalesced buffer fds, userspace can have
dedicated threads wait for new doorbell writes and avoid the performance
hit of userspace exits on the main vCPU threads.

Signed-off-by: Ilias Stamatis <ilstam@xxxxxxxxxx>
---

v2->v3:
  - Changed POLLIN | POLLRDNORM to EPOLLIN | EPOLLRDNORM

 virt/kvm/coalesced_mmio.c | 22 ++++++++++++++++++++++
 virt/kvm/coalesced_mmio.h |  1 +
 2 files changed, 23 insertions(+)

diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c
index 98b7e8760aa7..039c6ffcb2a8 100644
--- a/virt/kvm/coalesced_mmio.c
+++ b/virt/kvm/coalesced_mmio.c
@@ -16,6 +16,7 @@
 #include <linux/slab.h>
 #include <linux/kvm.h>
 #include <linux/anon_inodes.h>
+#include <linux/poll.h>
 
 #include "coalesced_mmio.h"
 
@@ -97,6 +98,10 @@ static int coalesced_mmio_write(struct kvm_vcpu *vcpu,
 	smp_wmb();
 	ring->last = (insert + 1) % KVM_COALESCED_MMIO_MAX;
 	spin_unlock(lock);
+
+	if (dev->buffer_dev)
+		wake_up_interruptible(&dev->buffer_dev->wait_queue);
+
 	return 0;
 }
 
@@ -223,9 +228,25 @@ static int coalesced_mmio_buffer_release(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static __poll_t coalesced_mmio_buffer_poll(struct file *file, struct poll_table_struct *wait)
+{
+	struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
+	__poll_t mask = 0;
+
+	poll_wait(file, &dev->wait_queue, wait);
+
+	spin_lock(&dev->ring_lock);
+	if (dev->ring && (READ_ONCE(dev->ring->first) != READ_ONCE(dev->ring->last)))
+		mask = EPOLLIN | EPOLLRDNORM;
+	spin_unlock(&dev->ring_lock);
+
+	return mask;
+}
+
 static const struct file_operations coalesced_mmio_buffer_ops = {
 	.mmap = coalesced_mmio_buffer_mmap,
 	.release = coalesced_mmio_buffer_release,
+	.poll = coalesced_mmio_buffer_poll,
 };
 
 int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
@@ -239,6 +260,7 @@ int kvm_vm_ioctl_create_coalesced_mmio_buffer(struct kvm *kvm)
 		return -ENOMEM;
 
 	dev->kvm = kvm;
+	init_waitqueue_head(&dev->wait_queue);
 	spin_lock_init(&dev->ring_lock);
 
 	ret = anon_inode_getfd("coalesced_mmio_buf", &coalesced_mmio_buffer_ops,
diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h
index 37d9d8f325bb..d1807ce26464 100644
--- a/virt/kvm/coalesced_mmio.h
+++ b/virt/kvm/coalesced_mmio.h
@@ -26,6 +26,7 @@ struct kvm_coalesced_mmio_dev {
 struct kvm_coalesced_mmio_buffer_dev {
 	struct list_head list;
 	struct kvm *kvm;
+	wait_queue_head_t wait_queue;
 	spinlock_t ring_lock;
 	struct kvm_coalesced_mmio_ring *ring;
 };
-- 
2.34.1