Re: [PATCH 1/3] KVM: Emulation MSI-X mask bits for assigned devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote:
> On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote:
> > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote:
> > > This patch enable per-vector mask for assigned devices using MSI-X.
> > > 
> > > Signed-off-by: Sheng Yang <sheng@xxxxxxxxxxxxxxx>
> > 
> > I think I see an issue here, noted below.  Some general comments:
> > - The mask bit seems broken for injecting interrupts from
> >   userspace (with interrupts/ioctls).
> >   I think we must test it on injection path.
> 
> I am not quite understand how it related to userspace interrupt injection here... 
> This patch only cover assigned devices for now.

Well, this is a kernel/userspace interface, if it's broken for userspace
injection now we'll have to go through pain to fix it in a compatible
way later when we want to use it for userspace injection.
You might want to ask why we want the kernel to support making
userspace-injected interrupts when userspace can just avoid injecting
them, and the answer would be that with irqfd the injection might be
handled in a separate process.

We currently handle this by destroying irqfd when irq is masked,
an ioctl instead would be much faster.

> > - We'll need a way to support the pending bit.
> >   Disabling interrupts will not let us do it.
> 
> We may need a way to support pending bit, though I don't know which guest has used 
> it... And we can still know if there is interrupt pending by check the real 
> hardware's pending bit if it's necessary.

That's what I'm saying: since instead of masking the vector in hardware
you disable irq in the APIC, the pending bit that we read from hardware
will not have the correct value.

If we fix this, pending bit handling can be done by userspace.

> (And we haven't seen any problem by 
> leaving the bit 0 so far, and it's not in this patch's scope.)

I don't know about anyone using this, either, but the PCI spec does
require support of polling mode where the pending bit is polled instead
of interrupts. So yes, not a high priority to implement, but let's give
the way we intend to support this in the future some thought.

> > > ---
> > > 
> > >  arch/x86/kvm/x86.c       |    1 +
> > >  include/linux/kvm.h      |    9 ++++++++-
> > >  include/linux/kvm_host.h |    1 +
> > >  virt/kvm/assigned-dev.c  |   39 
> +++++++++++++++++++++++++++++++++++++++
> > >  4 files changed, 49 insertions(+), 1 deletions(-)
> > > 
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 8412c91..e6933e6 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext)
> > > 
> > >  	case KVM_CAP_DEBUGREGS:
> > >  	case KVM_CAP_X86_ROBUST_SINGLESTEP:
> > > 
> > >  	case KVM_CAP_XSAVE:
> > > +	case KVM_CAP_DEVICE_MSIX_MASK:
> > >  		r = 1;
> > >  		break;
> > >  	
> > >  	case KVM_CAP_COALESCED_MMIO:
> > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h
> > > index 919ae53..f2b7cdc 100644
> > > --- a/include/linux/kvm.h
> > > +++ b/include/linux/kvm.h
> > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo {
> > > 
> > >  #endif
> > >  #define KVM_CAP_PPC_GET_PVINFO 57
> > >  #define KVM_CAP_PPC_IRQ_LEVEL 58
> > > 
> > > +#ifdef __KVM_HAVE_MSIX
> > > +#define KVM_CAP_DEVICE_MSIX_MASK 59
> > > +#endif
> > > 
> > >  #ifdef KVM_CAP_IRQ_ROUTING
> > > 
> > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr {
> > > 
> > >  };
> > >  
> > >  #define KVM_MAX_MSIX_PER_DEV		256
> > > 
> > > +
> > > +#define KVM_MSIX_FLAG_MASK	1
> > > +
> > > 
> > >  struct kvm_assigned_msix_entry {
> > >  
> > >  	__u32 assigned_dev_id;
> > >  	__u32 gsi;
> > >  	__u16 entry; /* The index of entry in the MSI-X table */
> > > 
> > > -	__u16 padding[3];
> > > +	__u16 flags;
> > > +	__u16 padding[2];
> > > 
> > >  };
> > >  
> > >  #endif /* __LINUX_KVM_H */
> > > 
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 0b89d00..a43405c 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier {
> > > 
> > >  };
> > >  
> > >  #define KVM_ASSIGNED_MSIX_PENDING		0x1
> > > 
> > > +#define KVM_ASSIGNED_MSIX_MASK			0x2
> > > 
> > >  struct kvm_guest_msix_entry {
> > >  
> > >  	u32 vector;
> > >  	u16 entry;
> > > 
> > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
> > > index 7c98928..15b8c32 100644
> > > --- a/virt/kvm/assigned-dev.c
> > > +++ b/virt/kvm/assigned-dev.c
> > > @@ -17,6 +17,8 @@
> > > 
> > >  #include <linux/pci.h>
> > >  #include <linux/interrupt.h>
> > >  #include <linux/slab.h>
> > > 
> > > +#include <linux/irqnr.h>
> > > +
> > > 
> > >  #include "irq.h"
> > >  
> > >  static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct
> > >  list_head *head,
> > > 
> > > @@ -666,6 +668,30 @@ msix_nr_out:
> > >  	return r;
> > >  
> > >  }
> > > 
> > > +static void update_msix_mask(struct kvm_assigned_dev_kernel
> > > *assigned_dev, +			     int index)
> > > +{
> > > +	int irq;
> > > +
> > > +	if (!assigned_dev->dev->msix_enabled ||
> > > +	    !(assigned_dev->irq_requested_type & KVM_DEV_IRQ_HOST_MSIX))
> > > +		return;
> > > +
> > > +	irq = assigned_dev->host_msix_entries[index].vector;
> > > +
> > > +	ASSERT(irq != 0);
> > > +
> > > +	if (assigned_dev->guest_msix_entries[index].flags &
> > > +			KVM_ASSIGNED_MSIX_MASK)
> > > +		disable_irq(irq);
> > > +	else {
> > > +		enable_irq(irq);
> > > +		if (assigned_dev->guest_msix_entries[index].flags &
> > > +				KVM_ASSIGNED_MSIX_PENDING)
> > > +			schedule_work(&assigned_dev->interrupt_work);
> > > +	}
> > > +}
> > > +
> > 
> > What happens if guest masks an entry and then we hot-unplug the
> > device and remove it from guest? It looks like interrupt
> > will stay disabled?
> 
> I don't think so. pci_disable_msix() which was called in hot-unplug path would 
> recycle all IRQs used by the device. It should be the same as VM shutdown.
> 
> Also before the IRQ was recycled, I believe the same dynamic IRQ wouldn't be used 
> by other devices.
> 
> --
> regards
> Yang, Sheng
> 
> > 
> > >  static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm,
> > >  
> > >  				       struct kvm_assigned_msix_entry *entry)
> > >  
> > >  {
> > > 
> > > @@ -688,6 +714,19 @@ static int kvm_vm_ioctl_set_msix_entry(struct kvm
> > > *kvm,
> > > 
> > >  			adev->guest_msix_entries[i].entry = entry->entry;
> > >  			adev->guest_msix_entries[i].vector = entry->gsi;
> > >  			adev->host_msix_entries[i].entry = entry->entry;
> > > 
> > > +			if ((entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > +					!(adev->guest_msix_entries[i].flags &
> > > +					KVM_ASSIGNED_MSIX_MASK)) {
> > > +				adev->guest_msix_entries[i].flags |=
> > > +					KVM_ASSIGNED_MSIX_MASK;
> > > +				update_msix_mask(adev, i);
> > > +			} else if (!(entry->flags & KVM_MSIX_FLAG_MASK) &&
> > > +					(adev->guest_msix_entries[i].flags &
> > > +					KVM_ASSIGNED_MSIX_MASK)) {
> > > +				adev->guest_msix_entries[i].flags &=
> > > +					~KVM_ASSIGNED_MSIX_MASK;
> > > +				update_msix_mask(adev, i);
> > > +			}
> > > 
> > >  			break;
> > >  		
> > >  		}
> > >  	
> > >  	if (i == adev->entries_nr) {
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux