Re: [PATCH 8/8] KVM: Emulation MSI-X mask bits for assigned devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wednesday 20 October 2010 16:26:32 Sheng Yang wrote:
> This patch enable per-vector mask for assigned devices using MSI-X.

The basic idea of kernel and QEmu's responsibilities are:

1. Because QEmu owned the irq routing table, so the change of table should still 
go to the QEmu, like we did in msix_mmio_write().

2. And the others things can be done in kernel, for performance. Here we covered 
the reading(converted entry from routing table and mask bit state of enabled MSI-X 
entries), and writing the mask bit for enabled MSI-X entries. Originally we only 
has mask bit handled in kernel, but later we found that Linux kernel would read 
MSI-X mmio just after every writing to mask bit, in order to flush the writing. So 
we add reading MSI data/addr as well.

3. Disabled entries's mask bit accessing would go to QEmu, because it may result 
in disable/enable MSI-X. Explained later.

4. Only QEmu has knowledge of PCI configuration space, so it's QEmu to 
decide enable/disable MSI-X for device.
.
5. There is an distinction between enabled entry and disabled entry of MSI-X 
table. The entries we had used for pci_enable_msix()(not necessary in sequence 
number) are already enabled, the others are disabled. When device's MSI-X is 
enabled and guest want to enable an disabled entry, we would go back to QEmu  
because this vector didn't exist in the routing table. Also due to 
pci_enable_msix() in kernel didn't allow us to enable vectors one by one, but all 
at once. So we have to disable MSI-X first, then enable it with new entries, which 
contained the new vector guest want to use. This situation is only happen when 
device is being initialized. After that, kernel can know and handle the mask bit 
of the enabled entry.

I've also considered handle all MMIO operation in kernel, and changing irq routing 
in kernel directly. But as long as irq routing is owned by QEmu, I think it's 
better to leave to it...

Notice the mask/unmask bits must be handled together, either in kernel or in 
userspace. Because if kernel has handled enabled vector's mask bit directly, it 
would be unsync with QEmu's records. It doesn't matter when QEmu don't access the 
related record. And the only place QEmu want to consult it's enabled entries' mask 
bit state is writing to MSI addr/data. The writing should be discarded if the 
entry is unmasked. This checking has already been done by kernel in this patchset, 
so we are fine here.

If we want to access the enabled entries' mask bit in the future, we can directly 
access device's MMIO. That's the reason why I have followed Michael's advice to 
use mask/unmask directly.

Hope this would make the patches more clear. I meant to add comments for this 
changeset, but miss it later.

--
regards
Yang, Sheng

> 
> Signed-off-by: Sheng Yang <sheng@xxxxxxxxxxxxxxx>
> ---
>  Documentation/kvm/api.txt |   22 ++++++++++++++++
>  arch/x86/kvm/x86.c        |    6 ++++
>  include/linux/kvm.h       |    8 +++++-
>  virt/kvm/assigned-dev.c   |   60
> +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 95
> insertions(+), 1 deletions(-)
> 
> diff --git a/Documentation/kvm/api.txt b/Documentation/kvm/api.txt
> index d82d637..f324a50 100644
> --- a/Documentation/kvm/api.txt
> +++ b/Documentation/kvm/api.txt
> @@ -1087,6 +1087,28 @@ of 4 instructions that make up a hypercall.
>  If any additional field gets added to this structure later on, a bit for
> that additional piece of information will be set in the flags bitmap.
> 
> +4.47 KVM_ASSIGN_REG_MSIX_MMIO
> +
> +Capability: KVM_CAP_DEVICE_MSIX_MASK
> +Architectures: x86
> +Type: vm ioctl
> +Parameters: struct kvm_assigned_msix_mmio (in)
> +Returns: 0 on success, !0 on error
> +
> +struct kvm_assigned_msix_mmio {
> +	/* Assigned device's ID */
> +	__u32 assigned_dev_id;
> +	/* MSI-X table MMIO address */
> +	__u64 base_addr;
> +	/* Must be 0 */
> +	__u32 flags;
> +	/* Must be 0, reserved for future use */
> +	__u64 reserved;
> +};
> +
> +This ioctl would enable in-kernel MSI-X emulation, which would handle
> MSI-X +mask bit in the kernel.
> +
>  5. The kvm_run structure
> 
>  Application code obtains a pointer to the kvm_run structure by
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fc62546..ba07a2f 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -1927,6 +1927,8 @@ int kvm_dev_ioctl_check_extension(long ext)
>  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
>  	case KVM_CAP_XSAVE:
>  	case KVM_CAP_ENABLE_CAP:
> +	case KVM_CAP_DEVICE_MSIX_EXT:
> +	case KVM_CAP_DEVICE_MSIX_MASK:
>  		r = 1;
>  		break;
>  	case KVM_CAP_COALESCED_MMIO:
> @@ -2717,6 +2719,10 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu
> *vcpu, return -EINVAL;
> 
>  	switch (cap->cap) {
> +	case KVM_CAP_DEVICE_MSIX_EXT:
> +		vcpu->kvm->arch.msix_flags_enabled = true;
> +		r = 0;
> +		break;
>  	default:
>  		r = -EINVAL;
>  		break;
> diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> index 0a7bd34..1494ed0 100644
> --- a/include/linux/kvm.h
> +++ b/include/linux/kvm.h
> @@ -540,6 +540,10 @@ struct kvm_ppc_pvinfo {
>  #endif
>  #define KVM_CAP_PPC_GET_PVINFO 57
>  #define KVM_CAP_PPC_IRQ_LEVEL 58
> +#ifdef __KVM_HAVE_MSIX
> +#define KVM_CAP_DEVICE_MSIX_EXT 59
> +#define KVM_CAP_DEVICE_MSIX_MASK 60
> +#endif
> 
>  #ifdef KVM_CAP_IRQ_ROUTING
> 
> @@ -671,6 +675,8 @@ struct kvm_clock_data {
>  #define KVM_XEN_HVM_CONFIG        _IOW(KVMIO,  0x7a, struct
> kvm_xen_hvm_config) #define KVM_SET_CLOCK             _IOW(KVMIO,  0x7b,
> struct kvm_clock_data) #define KVM_GET_CLOCK             _IOR(KVMIO, 
> 0x7c, struct kvm_clock_data) +#define KVM_ASSIGN_REG_MSIX_MMIO 
> _IOW(KVMIO,  0x7d, \
> +					struct kvm_assigned_msix_mmio)
>  /* Available with KVM_CAP_PIT_STATE2 */
>  #define KVM_GET_PIT2              _IOR(KVMIO,  0x9f, struct
> kvm_pit_state2) #define KVM_SET_PIT2              _IOW(KVMIO,  0xa0,
> struct kvm_pit_state2) @@ -802,7 +808,7 @@ struct kvm_assigned_msix_mmio {
>  	__u32 assigned_dev_id;
>  	__u64 base_addr;
>  	__u32 flags;
> -	__u32 reserved[2];
> +	__u64 reserved;
>  };
> 
>  #endif /* __LINUX_KVM_H */
> diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> index 5d2adc4..9573194 100644
> --- a/virt/kvm/assigned-dev.c
> +++ b/virt/kvm/assigned-dev.c
> @@ -17,6 +17,8 @@
>  #include <linux/pci.h>
>  #include <linux/interrupt.h>
>  #include <linux/slab.h>
> +#include <linux/irqnr.h>
> +
>  #include "irq.h"
> 
>  static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> list_head *head, @@ -169,6 +171,14 @@ static void deassign_host_irq(struct
> kvm *kvm, */
>  	if (assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX) {
>  		int i;
> +#ifdef __KVM_HAVE_MSIX
> +		if (assigned_dev->msix_mmio_base) {
> +			mutex_lock(&kvm->slots_lock);
> +			kvm_io_bus_unregister_dev(kvm, KVM_MMIO_BUS,
> +					&assigned_dev->msix_mmio_dev);
> +			mutex_unlock(&kvm->slots_lock);
> +		}
> +#endif
>  		for (i = 0; i < assigned_dev->entries_nr; i++)
>  			disable_irq_nosync(assigned_dev->
>  					   host_msix_entries[i].vector);
> @@ -318,6 +328,15 @@ static int assigned_device_enable_host_msix(struct kvm
> *kvm, goto err;
>  	}
> 
> +	if (dev->msix_mmio_base) {
> +		mutex_lock(&kvm->slots_lock);
> +		r = kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS,
> +				&dev->msix_mmio_dev);
> +		mutex_unlock(&kvm->slots_lock);
> +		if (r)
> +			goto err;
> +	}
> +
>  	return 0;
>  err:
>  	for (i -= 1; i >= 0; i--)
> @@ -870,6 +889,31 @@ static const struct kvm_io_device_ops msix_mmio_ops =
> { .write    = msix_mmio_write,
>  };
> 
> +static int kvm_vm_ioctl_register_msix_mmio(struct kvm *kvm,
> +				struct kvm_assigned_msix_mmio *msix_mmio)
> +{
> +	int r = 0;
> +	struct kvm_assigned_dev_kernel *adev;
> +
> +	mutex_lock(&kvm->lock);
> +	adev = kvm_find_assigned_dev(&kvm->arch.assigned_dev_head,
> +				      msix_mmio->assigned_dev_id);
> +	if (!adev) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +	if (msix_mmio->base_addr == 0) {
> +		r = -EINVAL;
> +		goto out;
> +	}
> +	adev->msix_mmio_base = msix_mmio->base_addr;
> +
> +	kvm_iodevice_init(&adev->msix_mmio_dev, &msix_mmio_ops);
> +out:
> +	mutex_unlock(&kvm->lock);
> +
> +	return r;
> +}
>  #endif
> 
>  long kvm_vm_ioctl_assigned_device(struct kvm *kvm, unsigned ioctl,
> @@ -982,6 +1026,22 @@ long kvm_vm_ioctl_assigned_device(struct kvm *kvm,
> unsigned ioctl, goto out;
>  		break;
>  	}
> +	case KVM_ASSIGN_REG_MSIX_MMIO: {
> +		struct kvm_assigned_msix_mmio msix_mmio;
> +
> +		r = -EFAULT;
> +		if (copy_from_user(&msix_mmio, argp, sizeof(msix_mmio)))
> +			goto out;
> +
> +		r = -EINVAL;
> +		if (msix_mmio.flags != 0 || msix_mmio.reserved != 0)
> +			goto out;
> +
> +		r = kvm_vm_ioctl_register_msix_mmio(kvm, &msix_mmio);
> +		if (r)
> +			goto out;
> +		break;
> +	}
>  #endif
>  	}
>  out:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux