Re: [PATCH v3 2/6] KVM: Add KVM_CREATE_COALESCED_MMIO_BUFFER ioctl

"Stamatis, Ilias" <ilstam@xxxxxxxxxxxx> · Thu, 29 Aug 2024 11:20:48 +0000

On Wed, 2024-08-28 at 07:25 -0700, Sean Christopherson wrote:
> On Tue, Aug 20, 2024, Ilias Stamatis wrote:
> > The current MMIO coalescing design has a few drawbacks which limit its
> > usefulness. Currently all coalesced MMIO zones use the same ring buffer.
> > That means that upon a userspace exit we have to handle potentially
> > unrelated MMIO writes synchronously. And a VM-wide lock needs to be
> > taken in the kernel when an MMIO exit occurs.
> > 
> > Additionally, there is no direct way for userspace to be notified about
> > coalesced MMIO writes. If the next MMIO exit to userspace is when the
> > ring buffer has filled then a substantial (and unbounded) amount of time
> > may have passed since the first coalesced MMIO.
> > 
> > Add a KVM_CREATE_COALESCED_MMIO_BUFFER ioctl to KVM. This ioctl simply
> 
> Why allocate the buffer in KVM?  It allows reusing coalesced_mmio_write() without
> needing {get,put}_user() or any form of kmap(), but it adds complexity (quite a
> bit in fact) to KVM and inherits many of the restrictions and warts of the existing
> coalesced I/O implementation.
> 
> E.g. the size of the ring buffer is "fixed", yet variable based on the host kernel
> page size.
> 
> Why not go with a BYOB model?  E.g. so that userspace can explicitly define the
> buffer size to best fit the properties of the I/O range, and to avoid additional
> complexity in KVM.

Fair enough. I used the original implementation as a guide, but let's
do it Right™ this time.

> 
> > returns a file descriptor to the caller but does not allocate a ring
> > buffer. Userspace can then pass this fd to mmap() to actually allocate a
> > buffer and map it to its address space.
> > 
> > Subsequent patches will allow userspace to:
> > 
> > - Associate the fd with a coalescing zone when registering it so that
> >   writes to that zone are accumulated in that specific ring buffer
> >   rather than the VM-wide one.
> > - Poll for MMIO writes using this fd.
> 
> Why?  I get the desire for a doorbell, but KVM already supports "fast" I/O for
> doorbells. 

What do you refer to here? ioeventfd? 

Correct me if I am wrong, but my understanding is that with an
ioeventfd the write value is not available. And that is a problem when
that value is a head or tail pointer. 

I also realised that I need to improve my commit messages.

>  The only use case I can think of is where the doorbell sits in the
> same region as the data payload, but that essentially just saves an entry in
> KVM's MMIO bus (or I suppose two entries if the doorbell is in the middle of
> the region).
> 
> > 
> > Signed-off-by: Ilias Stamatis <ilstam@xxxxxxxxxx>
> > Reviewed-by: Paul Durrant <paul@xxxxxxx>
> > ---
> 
> ...
> 
> 
> > +static void coalesced_mmio_buffer_vma_close(struct vm_area_struct *vma)
> > +{
> > +	struct kvm_coalesced_mmio_buffer_dev *dev = vma->vm_private_data;
> > +
> > +	spin_lock(&dev->ring_lock);
> > +
> > +	vfree(dev->ring);
> > +	dev->ring = NULL;
> > +
> > +	spin_unlock(&dev->ring_lock);
> 
> I doubt this is correct.  I don't see VM_DONTCOPY, and so userspace can create a
> second (or more) VMA by forking, and then KVM will hit a UAF if any of the VMAs
> is closed.

Oops. Will fix.

> 
> > +}
> > +
> > +static const struct vm_operations_struct coalesced_mmio_buffer_vm_ops = {
> > +	.close = coalesced_mmio_buffer_vma_close,
> > +};
> > +
> > +static int coalesced_mmio_buffer_mmap(struct file *file, struct vm_area_struct *vma)
> > +{
> > +	struct kvm_coalesced_mmio_buffer_dev *dev = file->private_data;
> > +	unsigned long pfn;
> > +	int ret = 0;
> > +
> > +	spin_lock(&dev->ring_lock);
> > +
> > +	if (dev->ring) {
> 
> Restricting userspace to a single mmap() is a very bizarre restriction.
> 

Hmm, you are right. Even if the buffer is allocated in the kernel, and
one was allocated already we should let the user map it to multiple
addresses. I will fix that too.

> > +		ret = -EBUSY;
> > +		goto out_unlock;
> > +	}

Thanks for the review.

Ilias