Re: [PATCH 05/10] VFIO: pci: Introduce direct EOI INTx interrupt handler

Marc Zyngier <marc.zyngier@xxxxxxx> · Fri, 2 Jun 2017 09:41:34 +0100

On 01/06/17 21:40, Auger Eric wrote:
> Hi Alex,
> 
> On 31/05/2017 20:24, Alex Williamson wrote:
>> On Wed, 24 May 2017 22:13:18 +0200
>> Eric Auger <eric.auger@xxxxxxxxxx> wrote:
>>
>>> We add two new fields in vfio_pci_irq_ctx struct: deoi and handler.
>>> If deoi is set, this means the physical IRQ attached to the virtual
>>> IRQ is directly deactivated by the guest and the VFIO driver does
>>> not need to disable the physical IRQ and mask it at VFIO level.
>>>
>>> The handler pointer is set accordingly and a wrapper handler is
>>> introduced that calls the chosen handler function.
>>>
>>> Signed-off-by: Eric Auger <eric.auger@xxxxxxxxxx>
>>>
>>> ---
>>> ---
>>>  drivers/vfio/pci/vfio_pci_intrs.c   | 32 ++++++++++++++++++++++++++------
>>>  drivers/vfio/pci/vfio_pci_private.h |  2 ++
>>>  2 files changed, 28 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
>>> index d4d377b..06aa713 100644
>>> --- a/drivers/vfio/pci/vfio_pci_intrs.c
>>> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
>>> @@ -121,11 +121,8 @@ void vfio_pci_intx_unmask(struct vfio_pci_device *vdev)
>>>  static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
>>>  {
>>>  	struct vfio_pci_device *vdev = dev_id;
>>> -	unsigned long flags;
>>>  	int ret = IRQ_NONE;
>>>  
>>> -	spin_lock_irqsave(&vdev->irqlock, flags);
>>> -
>>>  	if (!vdev->pci_2_3) {
>>>  		disable_irq_nosync(vdev->pdev->irq);
>>>  		vdev->ctx[0].automasked = true;
>>> @@ -137,14 +134,33 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
>>>  		ret = IRQ_HANDLED;
>>>  	}
>>>  
>>> -	spin_unlock_irqrestore(&vdev->irqlock, flags);
>>> -
>>>  	if (ret == IRQ_HANDLED)
>>>  		vfio_send_intx_eventfd(vdev, NULL);
>>>  
>>>  	return ret;
>>>  }
>>>  
>>> +static irqreturn_t vfio_intx_handler_deoi(int irq, void *dev_id)
>>> +{
>>> +	struct vfio_pci_device *vdev = dev_id;
>>> +
>>> +	vfio_send_intx_eventfd(vdev, NULL);
>>> +	return IRQ_HANDLED;
>>> +}
>>> +
>>> +static irqreturn_t vfio_intx_wrapper_handler(int irq, void *dev_id)
>>> +{
>>> +	struct vfio_pci_device *vdev = dev_id;
>>> +	unsigned long flags;
>>> +	irqreturn_t ret;
>>> +
>>> +	spin_lock_irqsave(&vdev->irqlock, flags);
>>> +	ret = vdev->ctx[0].handler(irq, dev_id);
>>> +	spin_unlock_irqrestore(&vdev->irqlock, flags);
>>> +
>>> +	return ret;
>>> +}
>>> +
>>>  static int vfio_intx_enable(struct vfio_pci_device *vdev)
>>>  {
>>>  	if (!is_irq_none(vdev))
>>> @@ -208,7 +224,11 @@ static int vfio_intx_set_signal(struct vfio_pci_device *vdev, int fd)
>>>  	if (!vdev->pci_2_3)
>>>  		irqflags = 0;
>>>  
>>> -	ret = request_irq(pdev->irq, vfio_intx_handler,
>>> +	if (vdev->ctx[0].deoi)
>>> +		vdev->ctx[0].handler = vfio_intx_handler_deoi;
>>> +	else
>>> +		vdev->ctx[0].handler = vfio_intx_handler;
>>> +	ret = request_irq(pdev->irq, vfio_intx_wrapper_handler,
>>>  			  irqflags, vdev->ctx[0].name, vdev);
>>
>>
>> Here's where I think we don't account for irqflags properly.  If we get
>> a shared interrupt here, then enabling direct EOI needs to be disabled
>> or else we'll starve other devices sharing the interrupt.  In practice,
>> I wonder if this makes PCI direct EOI a useful feature.  We could try
>> to get an exclusive interrupt and fallback to shared, but any time we
>> get an exclusive interrupt we're more prone to conflicts with other
>> devices.  I might have two VMs that share an interrupt and now it's a
>> race that only the first to setup an IRQ can work.  Worse, one of those
>> VMs might be fully booted and switched to MSI and now it's just a
>> matter of time until they reboot in the right way to generate a
>> conflict.  I might also have two devices in the same VM that share an
>> IRQ and now I can't start the VM at all because the second device can
>> no longer get an interrupt.  This is the same problem we have with the
>> nointxmask flag, it's a useful debugging feature but since the masking
>> is done at the APIC/GIC rather than the device, much like here, it's not
>> very practical for more than debugging and isolating specific devices
>> as requiring APIC/GIC level masking.  I'm not sure how to proceed on the
>> PCI side here. Thanks,
> 
> Thanks for the review.
> 
> I share you concerns about shared interrupts. I need to spend some
> additional time reading the VFIO code related to those and that part of
> the PCIe spec too.
> 
> Maybe we shall introduce the IRQ bypass based DEOI setup only for
> platform devices. I know those are not much used at the moment but this
> at least prepares for GICv4 direct injection.

I think that'd be good enough, unless we can ensure that we only engage
the Direct EOI feature when PCI legacy interrupts are not shared. The
system I have here (AMD Seattle) seem to have one set of INTx per PCIe
port, so the above would definitely work on it. But I understand that
there is not a guarantee at all.

Maybe the "nointxmask" flag is not that bad a solution in that case. It
would clearly outline that the user knows the platform is safe, and that
we can use the Direct EOI feature.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...