Re: [PATCH kernel] vfio: Print message about masking INTx only when it is known to be broken

Alexey Kardashevskiy <aik@xxxxxxxxx> · Thu, 22 Mar 2018 18:32:56 +1100

On 22/3/18 3:40 pm, Alex Williamson wrote:
> On Thu, 22 Mar 2018 15:02:09 +1100
> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
> 
>> On 19/3/18 1:15 pm, Alex Williamson wrote:
>>> On Mon, 19 Mar 2018 01:05:27 +1100
>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
>>>   
>>>> On 15/3/18 2:02 pm, Alex Williamson wrote:  
>>>>> On Tue, 13 Mar 2018 13:38:46 +1100
>>>>> Alexey Kardashevskiy <aik@xxxxxxxxx> wrote:
>>>>>     
>>>>>> On 13/3/18 9:44 am, Alex Williamson wrote:       
>>>>>  Should we only be invoking the nointx path
>>>>> if a) the device supports intx (pin reg), b) pdev->irq is null, and c)
>>>>> the device passes the v2.3 masking test (which is cached, not retested
>>>>> now, btw)?      
>>>>
>>>> Yes we should, and yes I missed that last bit about caching the result of
>>>> the test which is performed anyway in the common pci code. Thanks,  
>>>
>>> I spent a while considering whether we should include c) in the patch I
>>> posted, but I couldn't get past the question of what we'd do with the
>>> device if it doesn't pass the INTx masking test.  We'd want to emulate
>>> the INTx pin register and report no INTx irq info, as the nointx path
>>> does, then what?  If writing DisINTx doesn't do anything, we need to
>>> decide whether lack of INTx routing means the device can harmlessly
>>> pull INTx, or if pulling INTx is not harmless, then we'd need to ban
>>> the device from being assigned to the user.  Since we don't know of any
>>> problems and it seems sort of reasonable to assume that if there's no
>>> INTx routing that we can tug it all we want (because if someone is
>>> listening, it's a platform bug that it's not routed), I left it alone
>>> in my follow-up patch.  Let me know if you have other thoughts.  Thanks,  
>>
>>
>> Well, pulling INTx is nasty anyway as a device may die afterwards and the
>> specific INTx line (or a latch in PCIe bridge) will be blocked for the
>> devices sharing the same line, and DisINTx does not change that.
> 
> "... a device may die afterwards"?  What does that mean?  

I meant the device could be blocked by EEH after raising INTx.

> In normal
> operation, INTx fires, DisINTx gets set, the interrupt is forwarded to
> the user, the user services and unmasks the interrupt, rinse and
> repeat.  If the device somehow gets into a locked state where it's
> pulling INTx and disregarding DisINTx, a) this is broken hardware, b)
> the spurious interrupt handler will mask the interrupt and switch to
> polling, penalizing everyone sharing that line.  I don't know that
> there's a model where we can factor in catastrophic hardware failure
> into the equation.
>>> Our other
>> hypervisor (pHyp) simply prohibits INTx in order to avoid dealing with this
>> (this is why 2170dd04316e was made). So not enforcing nointx for
>> DisINTx-incapable devices does not seem much worse than enabling INTx for
>> pass through itself.
> 
> Too. Many. Negatives.  If a device shares an INTx line with other
> devices then it certainly needs to support DisINTx in order to play
> nicely with userspace drivers.  We require exclusive interrupts for
> devices that do not support DisINTx.  If the hypervisor is choosing to
> avoid dealing with that problem by not providing routing for INTx
> (effectively a paravirt please don't use this interface), then it seems
> like it's not fit to support userspace drivers. It's depending on the
> cooperation of the user. We can't prevent the user from making the card
> pull INTx, we can only discourage them from using it, which is not
> sufficient. Therefore if we really do need to block use of such
> devices (not INTx routing, no DisINTx support) from using vfio, my patch
> is insufficient and I think the best approach at this stage is to revert
> 2170dd04316e and you can rehash a solution for pHyp.  Thanks,

No, I do not want to block INTx and I am happy with "[PATCH v2] vfio/pci:
Quiet broken INTx whining when INTx is unsupported by device". I am missing
the point here, sorry, no-INTx-routing + no-DisINTx + no-MSI(X) - I
understand why we would want to block these but if a device cannot do INTx
nicely but still can do MSI(X) - why would we want to block it?

-- 
Alexey