On Wed, Oct 13, 2010 at 09:03:11AM +0800, Sheng Yang wrote: > On Wednesday 13 October 2010 02:28:59 Michael S. Tsirkin wrote: > > On Tue, Oct 12, 2010 at 02:49:58PM +0800, Sheng Yang wrote: > > > On Monday 11 October 2010 18:01:00 Michael S. Tsirkin wrote: > > > > On Mon, Oct 11, 2010 at 05:28:30PM +0800, Sheng Yang wrote: > > > > > On Sunday 03 October 2010 19:12:47 Michael S. Tsirkin wrote: > > > > > > On Tue, Sep 28, 2010 at 05:44:10PM +0800, Sheng Yang wrote: > > > > > > > This patch enable per-vector mask for assigned devices using > > > > > > > MSI-X. > > > > > > > > > > > > > > Signed-off-by: Sheng Yang <sheng@xxxxxxxxxxxxxxx> > > > > > > > > > > > > I think I see an issue here, noted below. Some general comments: > > > > > > - The mask bit seems broken for injecting interrupts from > > > > > > > > > > > > userspace (with interrupts/ioctls). > > > > > > I think we must test it on injection path. > > > > > > > > > > I am not quite understand how it related to userspace interrupt > > > > > injection here... This patch only cover assigned devices for now. > > > > > > > > Well, this is a kernel/userspace interface, if it's broken for > > > > userspace injection now we'll have to go through pain to fix it in a > > > > compatible way later when we want to use it for userspace injection. > > > > You might want to ask why we want the kernel to support making > > > > userspace-injected interrupts when userspace can just avoid injecting > > > > them, and the answer would be that with irqfd the injection might be > > > > handled in a separate process. > > > > > > OK, I've understood how it related to userspace interrupt injection. But > > > I still can't see why the interface is broken... > > > > > > > We currently handle this by destroying irqfd when irq is masked, > > > > an ioctl instead would be much faster. > > > > > > > > > > - We'll need a way to support the pending bit. > > > > > > > > > > > > Disabling interrupts will not let us do it. > > > > > > > > > > We may need a way to support pending bit, though I don't know which > > > > > guest has used it... And we can still know if there is interrupt > > > > > pending by check the real hardware's pending bit if it's necessary. > > > > > > > > That's what I'm saying: since instead of masking the vector in hardware > > > > you disable irq in the APIC, the pending bit that we read from hardware > > > > will not have the correct value. > > > > > > Are you sure? This disable_irq() has nothing to do with APIC. The disable > > > callback in msi_chip didn't do anything but mark the IRQ status as > > > IRQ_DISABLED, and the follow interrupt(if there are any) would be acked > > > and masked, using mask callback in msi_chip. > > > > > > irq_to_desc() need to be exported for my initial version, in order to use > > > mask callback. But later I think it would be clear and better if we use > > > general IRQ function to do it. And I don't think the current solution > > > would prevent us from reading hardware pending bits. > > > > Not sure, I'll try to look into code later, but just based on this > > description: > > > > on real hardware: > > mask > > interrupt > > results in both pending bit being set > > > > on guest with assigned device > > mask > > interrupt > > results in mask being set but pending bit not set > > (as interrupt was already sent) > > > > So if we try to look at pending bits looks like we'll miss > > some interrupts. No? > > Yes, there is one interrupt would fail to set the pending bit. But I don't think > it worth export/adding new interface for core irq handling functions now... Yea, maybe. For assigned devices pending bit is not currently implemented, so at least it's not a regression, and we can delay fixing this until we have VFIO. > Still, > I don't think pending bit matters much. And with current code, we can still make > pending bit works well on the most condition. I think that's good enough. Hmm, my guess is it's probably better to have a constant 0 there than a subtle race that will only trigger under load. > -- > regards > Yang, Sheng > > > > > > -- > > > regards > > > Yang, Sheng > > > > > > > If we fix this, pending bit handling can be done by userspace. > > > > > > > > > (And we haven't seen any problem by > > > > > leaving the bit 0 so far, and it's not in this patch's scope.) > > > > > > > > I don't know about anyone using this, either, but the PCI spec does > > > > require support of polling mode where the pending bit is polled instead > > > > of interrupts. So yes, not a high priority to implement, but let's give > > > > the way we intend to support this in the future some thought. > > > > > > > > > > > --- > > > > > > > > > > > > > > arch/x86/kvm/x86.c | 1 + > > > > > > > include/linux/kvm.h | 9 ++++++++- > > > > > > > include/linux/kvm_host.h | 1 + > > > > > > > virt/kvm/assigned-dev.c | 39 > > > > > > > > > > +++++++++++++++++++++++++++++++++++++++ > > > > > > > > > > > > 4 files changed, 49 insertions(+), 1 deletions(-) > > > > > > > > > > > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > > > > > index 8412c91..e6933e6 100644 > > > > > > > --- a/arch/x86/kvm/x86.c > > > > > > > +++ b/arch/x86/kvm/x86.c > > > > > > > @@ -1927,6 +1927,7 @@ int kvm_dev_ioctl_check_extension(long ext) > > > > > > > > > > > > > > case KVM_CAP_DEBUGREGS: > > > > > > > case KVM_CAP_X86_ROBUST_SINGLESTEP: > > > > > > > > > > > > > > case KVM_CAP_XSAVE: > > > > > > > + case KVM_CAP_DEVICE_MSIX_MASK: > > > > > > > r = 1; > > > > > > > break; > > > > > > > > > > > > > > case KVM_CAP_COALESCED_MMIO: > > > > > > > diff --git a/include/linux/kvm.h b/include/linux/kvm.h > > > > > > > index 919ae53..f2b7cdc 100644 > > > > > > > --- a/include/linux/kvm.h > > > > > > > +++ b/include/linux/kvm.h > > > > > > > @@ -540,6 +540,9 @@ struct kvm_ppc_pvinfo { > > > > > > > > > > > > > > #endif > > > > > > > #define KVM_CAP_PPC_GET_PVINFO 57 > > > > > > > #define KVM_CAP_PPC_IRQ_LEVEL 58 > > > > > > > > > > > > > > +#ifdef __KVM_HAVE_MSIX > > > > > > > +#define KVM_CAP_DEVICE_MSIX_MASK 59 > > > > > > > +#endif > > > > > > > > > > > > > > #ifdef KVM_CAP_IRQ_ROUTING > > > > > > > > > > > > > > @@ -787,11 +790,15 @@ struct kvm_assigned_msix_nr { > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > #define KVM_MAX_MSIX_PER_DEV 256 > > > > > > > > > > > > > > + > > > > > > > +#define KVM_MSIX_FLAG_MASK 1 > > > > > > > + > > > > > > > > > > > > > > struct kvm_assigned_msix_entry { > > > > > > > > > > > > > > __u32 assigned_dev_id; > > > > > > > __u32 gsi; > > > > > > > __u16 entry; /* The index of entry in the MSI-X table */ > > > > > > > > > > > > > > - __u16 padding[3]; > > > > > > > + __u16 flags; > > > > > > > + __u16 padding[2]; > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > #endif /* __LINUX_KVM_H */ > > > > > > > > > > > > > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h > > > > > > > index 0b89d00..a43405c 100644 > > > > > > > --- a/include/linux/kvm_host.h > > > > > > > +++ b/include/linux/kvm_host.h > > > > > > > @@ -415,6 +415,7 @@ struct kvm_irq_ack_notifier { > > > > > > > > > > > > > > }; > > > > > > > > > > > > > > #define KVM_ASSIGNED_MSIX_PENDING 0x1 > > > > > > > > > > > > > > +#define KVM_ASSIGNED_MSIX_MASK 0x2 > > > > > > > > > > > > > > struct kvm_guest_msix_entry { > > > > > > > > > > > > > > u32 vector; > > > > > > > u16 entry; > > > > > > > > > > > > > > diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c > > > > > > > index 7c98928..15b8c32 100644 > > > > > > > --- a/virt/kvm/assigned-dev.c > > > > > > > +++ b/virt/kvm/assigned-dev.c > > > > > > > @@ -17,6 +17,8 @@ > > > > > > > > > > > > > > #include <linux/pci.h> > > > > > > > #include <linux/interrupt.h> > > > > > > > #include <linux/slab.h> > > > > > > > > > > > > > > +#include <linux/irqnr.h> > > > > > > > + > > > > > > > > > > > > > > #include "irq.h" > > > > > > > > > > > > > > static struct kvm_assigned_dev_kernel > > > > > > > *kvm_find_assigned_dev(struct list_head *head, > > > > > > > > > > > > > > @@ -666,6 +668,30 @@ msix_nr_out: > > > > > > > return r; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > +static void update_msix_mask(struct kvm_assigned_dev_kernel > > > > > > > *assigned_dev, + int index) > > > > > > > +{ > > > > > > > + int irq; > > > > > > > + > > > > > > > + if (!assigned_dev->dev->msix_enabled || > > > > > > > + !(assigned_dev->irq_requested_type & > > > > > > > KVM_DEV_IRQ_HOST_MSIX)) + return; > > > > > > > + > > > > > > > + irq = assigned_dev->host_msix_entries[index].vector; > > > > > > > + > > > > > > > + ASSERT(irq != 0); > > > > > > > + > > > > > > > + if (assigned_dev->guest_msix_entries[index].flags & > > > > > > > + KVM_ASSIGNED_MSIX_MASK) > > > > > > > + disable_irq(irq); > > > > > > > + else { > > > > > > > + enable_irq(irq); > > > > > > > + if (assigned_dev->guest_msix_entries[index].flags & > > > > > > > + KVM_ASSIGNED_MSIX_PENDING) > > > > > > > + schedule_work(&assigned_dev->interrupt_work); > > > > > > > + } > > > > > > > +} > > > > > > > + > > > > > > > > > > > > What happens if guest masks an entry and then we hot-unplug the > > > > > > device and remove it from guest? It looks like interrupt > > > > > > will stay disabled? > > > > > > > > > > I don't think so. pci_disable_msix() which was called in hot-unplug > > > > > path would recycle all IRQs used by the device. It should be the > > > > > same as VM shutdown. > > > > > > > > > > Also before the IRQ was recycled, I believe the same dynamic IRQ > > > > > wouldn't be used by other devices. > > > > > > > > > > -- > > > > > regards > > > > > Yang, Sheng > > > > > > > > > > > > static int kvm_vm_ioctl_set_msix_entry(struct kvm *kvm, > > > > > > > > > > > > > > struct kvm_assigned_msix_entry *entry) > > > > > > > > > > > > > > { > > > > > > > > > > > > > > @@ -688,6 +714,19 @@ static int > > > > > > > kvm_vm_ioctl_set_msix_entry(struct kvm *kvm, > > > > > > > > > > > > > > adev->guest_msix_entries[i].entry = entry->entry; > > > > > > > adev->guest_msix_entries[i].vector = entry->gsi; > > > > > > > adev->host_msix_entries[i].entry = entry->entry; > > > > > > > > > > > > > > + if ((entry->flags & KVM_MSIX_FLAG_MASK) && > > > > > > > + !(adev->guest_msix_entries[i].flags & > > > > > > > + KVM_ASSIGNED_MSIX_MASK)) { > > > > > > > + adev->guest_msix_entries[i].flags |= > > > > > > > + KVM_ASSIGNED_MSIX_MASK; > > > > > > > + update_msix_mask(adev, i); > > > > > > > + } else if (!(entry->flags & KVM_MSIX_FLAG_MASK) && > > > > > > > + (adev->guest_msix_entries[i].flags & > > > > > > > + KVM_ASSIGNED_MSIX_MASK)) { > > > > > > > + adev->guest_msix_entries[i].flags &= > > > > > > > + ~KVM_ASSIGNED_MSIX_MASK; > > > > > > > + update_msix_mask(adev, i); > > > > > > > + } > > > > > > > > > > > > > > break; > > > > > > > > > > > > > > } > > > > > > > > > > > > > > if (i == adev->entries_nr) { > > > > > > -- > > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html