Re: [PATCH 2/3] KVM: Emulate MSI-X table in kernel

Sheng Yang <sheng@xxxxxxxxxxxxxxx> · Wed, 19 Jan 2011 16:37:37 +0800

On Monday 17 January 2011 20:39:30 Marcelo Tosatti wrote:
> On Mon, Jan 17, 2011 at 08:18:22PM +0800, Sheng Yang wrote:
> > > > +		goto out;
> > > > +
> > > > +	mmio = &mmio_dev->mmio[idx];
> > > > +	entry = (addr - mmio->table_base_addr) / PCI_MSIX_ENTRY_SIZE;
> > > > +	entry_base = mmio->table_base_va + entry * PCI_MSIX_ENTRY_SIZE;
> > > > +	ctrl_pos = (u32 *)(entry_base + PCI_MSIX_ENTRY_VECTOR_CTRL);
> > > > +
> > > > +	if (get_user(old_ctrl, ctrl_pos))
> > > > +		goto out;
> > > > +
> > > > +	/* No allow writing to other fields when entry is unmasked */
> > > > +	if (!(old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +	    offset != PCI_MSIX_ENTRY_VECTOR_CTRL)
> > > > +		goto out;
> > > > +
> > > > +	if (copy_to_user((void __user *)(entry_base + offset), val, len))
> > > > +		goto out;
> > > 
> > > Instead of copying to/from userspace (which is subject to swapin,
> > > unexpected values), you could include the guest written value in a
> > > kvm_run structure, along with address. Qemu-kvm would use that to
> > > synchronize its copy of the table, on KVM_EXIT_MSIX_ROUTING_UPDATE
> > > exit.
> > 
> > We want to acelerate MSI-X mask bit accessing, which won't exit to
> > userspace in the most condition. That's the cost we want to optimize.
> > Also it's possible to userspace to read the correct value of MMIO(but
> > mostly userspace can't write to it in order to prevent synchronize
> > issue).
> 
> Yes, i meant exit to userspace only when necessary, but instead of
> copying directly everytime record the value the guest has written in
> kvm_run and exit with KVM_EXIT_MSIX_ROUTING_UPDATE.
> 
> > > > +	if (get_user(new_ctrl, ctrl_pos))
> > > > +		goto out;
> > > > +
> > > > +	if ((offset < PCI_MSIX_ENTRY_VECTOR_CTRL && len == 4) ||
> > > > +	    (offset < PCI_MSIX_ENTRY_DATA && len == 8))
> > > > +		ret = -ENOTSYNC;
> > > > +	if (old_ctrl == new_ctrl)
> > > > +		goto out;
> > > > +	if (!(old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +			(new_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 1);
> > > > +	else if ((old_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT) &&
> > > > +			!(new_ctrl & PCI_MSIX_ENTRY_CTRL_MASKBIT))
> > > > +		r = update_msix_mask_bit(mmio_dev->kvm, mmio, entry, 0);
> > > 
> > > Then you rely on the kernel copy of the values to enable/disable irq.
> > 
> > Yes, they are guest owned assigned IRQs. Any potential issue?
> 
> Nothing prevents the kernel from enabling or disabling irq multiple
> times with this code (what prevents it is a qemu-kvm that behaves as
> expected). This is not very good.
> 
> Perhaps the guest can only harm itself with that, but i'm not sure.

MSI-X interrupts are not shared, so I think guest can only harm itself if it was 
doing it wrong.
> 
> And also if an msix table page is swapped out guest will hang.
> 
> > > > +	return r;
> > > > +}
> > > 
> > > This is not consistent with registration, where there are no checks
> > > regarding validity of assigned device id. So why is it necessary?
> > 
> > I am not quite understand. We need to free mmio anyway, otherwise it
> > would result in wrong mmio interception...
> 
> If you return -EOPNOTSUPP in case kvm_find_assigned_dev fails in the
> read/write paths, there is no need to free anything.

It may work with assigned devices, but the potential user of this is including vfio 
drivers and emulate devices. And I don't think it's a good idea to have 
registeration process but no free process...

--
regards
Yang, Sheng

> 
> > > There is a lock ordering problem BTW:
> > > 
> > > - read/write handlers: dev->lock -> kvm->lock
> > > - vm_ioctl_deassign_device -> kvm_free_msix_mmio: kvm->lock ->
> > > dev->lock
> > 
> > Good catch! Would fix it(and other comments of course).
> > 
> > --
> > regards
> > Yang, Sheng
> > 
> > > > +
> > > > +int kvm_vm_ioctl_unregister_msix_mmio(struct kvm *kvm,
> > > > +				      struct kvm_msix_mmio_user *mmio_user)
> > > > +{
> > > > +	struct kvm_msix_mmio mmio;
> > > > +
> > > > +	mmio.dev_id = mmio_user->dev_id;
> > > > +	mmio.type = mmio_user->type;
> > > > +
> > > > +	return kvm_free_msix_mmio(kvm, &mmio);
> > > > +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html