On Mon, 15 Jul 2024 15:10:01 +0100, Johan Hovold <johan@xxxxxxxxxx> wrote: > > On Mon, Jul 15, 2024 at 01:58:13PM +0100, Marc Zyngier wrote: > > On Mon, 15 Jul 2024 12:18:47 +0100, > > Johan Hovold <johan@xxxxxxxxxx> wrote: > > > On Sun, Jun 23, 2024 at 05:18:31PM +0200, Thomas Gleixner wrote: > > > > This is version 4 of the series to convert ARM MSI handling over to > > > > per device MSI domains. > > > > This series only showed up in linux-next last Friday and broke interrupt > > > handling on Qualcomm platforms like sc8280xp (e.g. Lenovo ThinkPad X13s) > > > and x1e80100 that use the GIC ITS for PCIe MSIs. > > > > > > I've applied the series (21 commits from linux-next) on top of 6.10 and > > > can confirm that the breakage is caused by commits: > > > > > > 3d1c927c08fc ("irqchip/gic-v3-its: Switch platform MSI to MSI parent") > > > 233db05bc37f ("irqchip/gic-v3-its: Provide MSI parent for PCI/MSI[-X]") > > > > > > Applying the series up until the change before 3d1c927c08fc unbreaks the > > > wifi on one machine: > > > > > > ath11k_pci 0006:01:00.0: failed to enable msi: -22 > > > ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -22 > > > > > > and backing up until the commit before 233db05bc37f makes the NVMe come > > > up again during boot on another. > > > > > > I have not tried to debug this further. > > > > I need a few things from you though, because you're not giving much to > > help you (and I'm travelling, which doesn't help). > > Yeah, this was just an early heads up. > > > Can you at least investigate what in ath11k_pci_alloc_msi() causes the > > wifi driver to be upset? Does it normally use a single MSI vector or > > MSI-X? How about your nVME device? > > It uses multiple vectors, but now it falls back to trying to allocate a > single one and even that fails with -ENOSPC: > > ath11k_pci 0006:01:00.0: ath11k_pci_alloc_msi - requesting one vector failed: -28 > > Similar for the NVMe, it uses multiple vectors normally, but now only > the AER interrupts appears to be allocated for each controller and there > is a GICv3 interrupt for the NVMe: > > 208: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0006:00:00.0 0 Edge PCIe PME, aerdrv > 212: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0004:00:00.0 0 Edge PCIe PME, aerdrv > 214: 161 0 0 0 0 0 0 0 GICv3 562 Level nvme0q0, nvme0q1 > 215: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0002:00:00.0 0 Edge PCIe PME, aerdrv > That's an indication of the driver having failed its MSI allocation and gone back to INTx signalling. > Next boot, after disabling PCIe controller async probing, it's an MSI-X?!: > > 201: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0006:00:00.0 0 Edge PCIe PME, aerdrv > 203: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0004:00:00.0 0 Edge PCIe PME, aerdrv > 205: 0 0 0 0 0 0 0 0 ITS-PCI-MSI-0002:00:00.0 0 Edge PCIe PME, aerdrv > 206: 0 0 0 0 0 0 0 0 ITS-PCI-MSIX-0002:01:00.0 0 Edge nvme0q0 > So is this issue actually tied to the async probing? Does it always work if you disable it? > This time ath11k vector allocation succeeded, but the driver times out > eventually: > > [ 8.984619] ath11k_pci 0006:01:00.0: MSI vectors: 32 > [ 29.690841] ath11k_pci 0006:01:00.0: failed to power up mhi: -110 > [ 29.697136] ath11k_pci 0006:01:00.0: failed to start mhi: -110 > [ 29.703153] ath11k_pci 0006:01:00.0: failed to power up :-110 > [ 29.732144] ath11k_pci 0006:01:00.0: failed to create soc core: -110 > [ 29.738694] ath11k_pci 0006:01:00.0: failed to init core: -110 > [ 32.841758] ath11k_pci 0006:01:00.0: probe with driver ath11k_pci failed with error -110 > > > It would also help if you could define the DEBUG symbol at the very > > top of irq-gic-v3-its.c and report the debug information that the ITS > > driver dumps. > > See below (with synchronous probing of the pcie controllers). I don't see much going wrong there, and the ITS driver correctly dishes out interrupts. I'll take the current -next for a ride on my own HW and see what happens. M. -- Without deviation from the norm, progress is not possible.