On 16/09/21 4:32 am, Alex Williamson wrote: > On Wed, 15 Sep 2021 16:44:38 +1200 > Matthew Ruffell <matthew.ruffell@xxxxxxxxxxxxx> wrote: >> On 15/09/21 4:43 am, Alex Williamson wrote: >>> >>> FWIW, I have access to a system with an NVIDIA K1 and M60, both use >>> this same switch on-card and I've not experienced any issues assigning >>> all the GPUs to a single VM. Topo: >>> >>> +-[0000:40]-+-02.0-[42-47]----00.0-[43-47]--+-08.0-[44]----00.0 >>> | +-09.0-[45]----00.0 >>> | +-10.0-[46]----00.0 >>> | \-11.0-[47]----00.0 >>> \-[0000:00]-+-03.0-[04-07]----00.0-[05-07]--+-08.0-[06]----00.0 >>> \-10.0-[07]----00.0 > > > I've actually found that the above configuration, assigning all 6 GPUs > to a VM reproduces this pretty readily by simply rebooting the VM. In > my case, I don't have the panic-on-warn/oops that must be set on your > kernel, so the result is far more benign, the IRQ gets masked until > it's re-registered. > > The fact that my upstream ports are using MSI seems irrelevant. Hi Alex, It is good news that you can reproduce an interrupt storm locally. Did a single reboot trigger the storm, or did you have to loop the VM a few times? On our system, if we don't have panic-on-warn/oops set, the system will eventually grind to a halt and lock up, so we try to reset earlier on the first oops, but we still get stuck in the crashkernel copying the IR tables from dmar. > > Adding debugging to the vfio-pci interrupt handler, it's correctly > deferring the interrupt as the GPU device is not identifying itself as > the source of the interrupt via the status register. In fact, setting > the disable INTx bit in the GPU command register while the interrupt > storm occurs does not stop the interrupts. > Interesting. So the source of the interrupts could be from the PEX switch itself? We did a run with DisIntx+ set on the PEX switches, but it didn't make any difference. Serial log showing DisIntx+ and full dmesg below: https://paste.ubuntu.com/p/n3XshCxPT8/ > The interrupt storm does seem to be related to the bus resets, but I > can't figure out yet how multiple devices per switch factors into the > issue. Serializing all bus resets via a mutex doesn't seem to change > the behavior. Very interesting indeed. > I'm still investigating, but if anyone knows how to get access to the > Broadcom datasheet or errata for this switch, please let me know. I have tried reaching out to Broadcom asking for the datasheet and errata, but I am unsure if they will get back to me. They list the errata as publicly available on their website, in the Documentation > errata tab. https://www.broadcom.com/products/pcie-switches-bridges/pcie-switches/pex8749#documentation The file "PEX 8749/48/47/33/32/25/24/23/17/16/13/12 Errata" seems to be missing though. https://docs.broadcom.com/docs/PEX8749-48-47-33-32-25-24-23-17-16-13-12%20Errata-and-Cautions An Intel document talks about the errata for the PEX 8749: https://www.intel.com/content/dam/www/programmable/us/en/pdfs/literature/rn/rn-ias-n3000-n.pdf It links to the following URL, also missing. https://docs.broadcom.com/docs/pub-005018 I did however find an older errata document at: PEX 87xx Errata Version 1.14, September 25, 2015 https://docs.broadcom.com/doc/pub-005017 I will keep trying, and I will let you know if we manage to come across any documents. Thank you for your efforts. Matthew > Thanks, > Alex >