Re: [PROBLEM] Frequently get "irq 31: nobody cared" when passing through 2x GPUs that share same pci switch via vfio

Alex Williamson <alex.williamson@xxxxxxxxxx> · Wed, 15 Sep 2021 10:32:35 -0600

On Wed, 15 Sep 2021 16:44:38 +1200
Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote:
> On 15/09/21 4:43 am, Alex Williamson wrote:
> > 
> > FWIW, I have access to a system with an NVIDIA K1 and M60, both use
> > this same switch on-card and I've not experienced any issues assigning
> > all the GPUs to a single VM.  Topo:
> > 
> >  +-[0000:40]-+-02.0-[42-47]----00.0-[43-47]--+-08.0-[44]----00.0
> >  |                                           +-09.0-[45]----00.0
> >  |                                           +-10.0-[46]----00.0
> >  |                                           \-11.0-[47]----00.0
> >  \-[0000:00]-+-03.0-[04-07]----00.0-[05-07]--+-08.0-[06]----00.0
> >                                              \-10.0-[07]----00.0

I've actually found that the above configuration, assigning all 6 GPUs
to a VM reproduces this pretty readily by simply rebooting the VM.  In
my case, I don't have the panic-on-warn/oops that must be set on your
kernel, so the result is far more benign, the IRQ gets masked until
it's re-registered.

The fact that my upstream ports are using MSI seems irrelevant.

Adding debugging to the vfio-pci interrupt handler, it's correctly
deferring the interrupt as the GPU device is not identifying itself as
the source of the interrupt via the status register.  In fact, setting
the disable INTx bit in the GPU command register while the interrupt
storm occurs does not stop the interrupts.

The interrupt storm does seem to be related to the bus resets, but I
can't figure out yet how multiple devices per switch factors into the
issue.  Serializing all bus resets via a mutex doesn't seem to change
the behavior.

I'm still investigating, but if anyone knows how to get access to the
Broadcom datasheet or errata for this switch, please let me know.
Thanks,

Alex