On 01/06/17 21:40, Auger Eric wrote: > Hi Alex, > > On 31/05/2017 20:24, Alex Williamson wrote: >> On Wed, 24 May 2017 22:13:18 +0200 >> Eric Auger <eric.auger@xxxxxxxxxx> wrote: >> >>> We add two new fields in vfio_pci_irq_ctx struct: deoi and handler. >>> If deoi is set, this means the physical IRQ attached to the virtual >>> IRQ is directly deactivated by the guest and the VFIO driver does >>> not need to disable the physical IRQ and mask it at VFIO level. >>> >>> The handler pointer is set accordingly and a wrapper handler is >>> introduced that calls the chosen handler function. >>> >>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx> >>> >>> --- >>> --- >>> drivers/vfio/pci/vfio_pci_intrs.c | 32 ++++++++++++++++++++++++++------ >>> drivers/vfio/pci/vfio_pci_private.h | 2 ++ >>> 2 files changed, 28 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c >>> index d4d377b..06aa713 100644 >>> --- a/drivers/vfio/pci/vfio_pci_intrs.c >>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c >>> @@ -121,11 +121,8 @@ void vfio_pci_intx_unmask(struct vfio_pci_device *vdev) >>> static irqreturn_t vfio_intx_handler(int irq, void *dev_id) >>> { >>> struct vfio_pci_device *vdev = dev_id; >>> - unsigned long flags; >>> int ret = IRQ_NONE; >>> >>> - spin_lock_irqsave(&vdev->irqlock, flags); >>> - >>> if (!vdev->pci_2_3) { >>> disable_irq_nosync(vdev->pdev->irq); >>> vdev->ctx[0].automasked = true; >>> @@ -137,14 +134,33 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id) >>> ret = IRQ_HANDLED; >>> } >>> >>> - spin_unlock_irqrestore(&vdev->irqlock, flags); >>> - >>> if (ret == IRQ_HANDLED) >>> vfio_send_intx_eventfd(vdev, NULL); >>> >>> return ret; >>> } >>> >>> +static irqreturn_t vfio_intx_handler_deoi(int irq, void *dev_id) >>> +{ >>> + struct vfio_pci_device *vdev = dev_id; >>> + >>> + vfio_send_intx_eventfd(vdev, NULL); >>> + return IRQ_HANDLED; >>> +} >>> + >>> +static irqreturn_t vfio_intx_wrapper_handler(int irq, void *dev_id) >>> +{ >>> + struct vfio_pci_device *vdev = dev_id; >>> + unsigned long flags; >>> + irqreturn_t ret; >>> + >>> + spin_lock_irqsave(&vdev->irqlock, flags); >>> + ret = vdev->ctx[0].handler(irq, dev_id); >>> + spin_unlock_irqrestore(&vdev->irqlock, flags); >>> + >>> + return ret; >>> +} >>> + >>> static int vfio_intx_enable(struct vfio_pci_device *vdev) >>> { >>> if (!is_irq_none(vdev)) >>> @@ -208,7 +224,11 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd) >>> if (!vdev->pci_2_3) >>> irqflags = 0; >>> >>> - ret = request_irq(pdev->irq, vfio_intx_handler, >>> + if (vdev->ctx[0].deoi) >>> + vdev->ctx[0].handler = vfio_intx_handler_deoi; >>> + else >>> + vdev->ctx[0].handler = vfio_intx_handler; >>> + ret = request_irq(pdev->irq, vfio_intx_wrapper_handler, >>> irqflags, vdev->ctx[0].name, vdev); >> >> >> Here's where I think we don't account for irqflags properly. If we get >> a shared interrupt here, then enabling direct EOI needs to be disabled >> or else we'll starve other devices sharing the interrupt. In practice, >> I wonder if this makes PCI direct EOI a useful feature. We could try >> to get an exclusive interrupt and fallback to shared, but any time we >> get an exclusive interrupt we're more prone to conflicts with other >> devices. I might have two VMs that share an interrupt and now it's a >> race that only the first to setup an IRQ can work. Worse, one of those >> VMs might be fully booted and switched to MSI and now it's just a >> matter of time until they reboot in the right way to generate a >> conflict. I might also have two devices in the same VM that share an >> IRQ and now I can't start the VM at all because the second device can >> no longer get an interrupt. This is the same problem we have with the >> nointxmask flag, it's a useful debugging feature but since the masking >> is done at the APIC/GIC rather than the device, much like here, it's not >> very practical for more than debugging and isolating specific devices >> as requiring APIC/GIC level masking. I'm not sure how to proceed on the >> PCI side here. Thanks, > > Thanks for the review. > > I share you concerns about shared interrupts. I need to spend some > additional time reading the VFIO code related to those and that part of > the PCIe spec too. > > Maybe we shall introduce the IRQ bypass based DEOI setup only for > platform devices. I know those are not much used at the moment but this > at least prepares for GICv4 direct injection. I think that'd be good enough, unless we can ensure that we only engage the Direct EOI feature when PCI legacy interrupts are not shared. The system I have here (AMD Seattle) seem to have one set of INTx per PCIe port, so the above would definitely work on it. But I understand that there is not a guarantee at all. Maybe the "nointxmask" flag is not that bad a solution in that case. It would clearly outline that the user knows the platform is safe, and that we can use the Direct EOI feature. Thanks, M. -- Jazz is not dead. It just smells funny...