On Sat, 29 Jun 2024 10:50:33 +0100, Marc Zyngier <maz@xxxxxxxxxx> wrote: > > On Sat, 29 Jun 2024 10:42:35 +0100, > Marc Zyngier <maz@xxxxxxxxxx> wrote: > > > > On Sat, 29 Jun 2024 09:37:59 +0100, > > Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote: > > > > > > On Fri, Jun 28 2024 at 23:24, Catalin Marinas wrote: > > > > I just noticed guests (under KVM) failing to boot on my TX2 with your > > > > latest branch. I bisected to this patch as the first bad commit. > > > > > > > > I'm away this weekend, so won't have time to dive deeper. It looks like > > > > the CPU is stuck in do_idle() (no timer interrupts?). Also sysrq did not > > > > seem able to get the stack trace on the other CPUs. It fails both with a > > > > single or multiple CPUs in the same way place (shortly before mounting > > > > the rootfs and starting user space). > > > > > > From the RH log it's clear that PCI interrupts are not delivered. > > > > > > > I'll drop your branch from the arm64 for-kernelci for now and have a > > > > look again on Monday. > > > > > > I stare too. Unfortunately I don't have access to such hardware :( > > > > On the face of it, the LPIs are never unmasked (grepping in > > /sys/kernel/debug/kvm/*/vgic-state): > > > > Distributor > > =========== > > vgic_model: GICv3 > > nr_spis: 32 > > nr_lpis: 7 > > enabled: 1 > > > > P=pending_latch, L=line_level, A=active > > E=enabled, H=hw, C=config (level=1, edge=0) > > G=group > > > > VCPU 0 TYP ID TGT_ID PLAEHCG HWID TARGET SRC PRI VCPU_ID > > ---------------------------------------------------------------- > > [...] > > LPI 8192 0 1000001 0 0 0 160 -1 > > LPI 8193 1 0000001 0 0 0 160 -1 > > LPI 8194 2 0000001 0 0 0 160 -1 > > LPI 8256 3 0000001 0 0 0 160 -1 > > LPI 8257 4 0000001 0 0 0 160 -1 > > LPI 8320 5 0000001 0 0 0 160 -1 > > LPI 8321 6 1000001 0 0 0 160 -1 > > > > 8192 and 8321 are pending, but never enabled. > > > > This is further confirmed by placing traces in the guest. Now trying > > to find my way through the new maze of callbacks, because something is > > clearly missing there. > > This is clearly related to MSI_FLAG_PCI_MSI_MASK_PARENT which is not > seen as being set from cond_unmask_parent(), and ignoring this > condition results in a booting VM. > > I have the ugly feeling that the flag is applied at the wrong level, > or not propagated. Here's a possible fix. Making the masking at the ITS level optional is not an option (haha). It is the PCI masking that is totally superfluous and that could completely be elided. With this hack, I can boot a GICv3+ITS guest as usual. M. diff --git a/drivers/irqchip/irq-gic-v3-its-msi-parent.c b/drivers/irqchip/irq-gic-v3-its-msi-parent.c index 21daa452ffa6d..b66e64eaae440 100644 --- a/drivers/irqchip/irq-gic-v3-its-msi-parent.c +++ b/drivers/irqchip/irq-gic-v3-its-msi-parent.c @@ -10,13 +10,13 @@ #include "irq-gic-common.h" #include "irq-msi-lib.h" -#define ITS_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \ - MSI_FLAG_USE_DEF_CHIP_OPS) +#define ITS_MSI_FLAGS_REQUIRED (MSI_FLAG_USE_DEF_DOM_OPS | \ + MSI_FLAG_USE_DEF_CHIP_OPS | \ + MSI_FLAG_PCI_MSI_MASK_PARENT) #define ITS_MSI_FLAGS_SUPPORTED (MSI_GENERIC_FLAGS_MASK | \ MSI_FLAG_PCI_MSIX | \ - MSI_FLAG_MULTI_PCI_MSI | \ - MSI_FLAG_PCI_MSI_MASK_PARENT) + MSI_FLAG_MULTI_PCI_MSI) #ifdef CONFIG_PCI_MSI static int its_pci_msi_vec_count(struct pci_dev *pdev, void *data) -- Without deviation from the norm, progress is not possible.