[+cc Marc, Thomas, Manivannan, Krishna because there's recent discussion about changing MSI IRQ affinity late in shutdown: https://lore.kernel.org/linux-pci/?q=b%3Amsi+b%3Aaffinity+] On Wed, Sep 28, 2022 at 12:01:55PM -0700, Kevin Rowland wrote: > Hello all. > > I'm running a custom kernel based on 5.10.72 on an NXP i.MX8QM (an > ARMv8-A system). The i.MX8QM has 2 designware PCIe 3.0 controllers > on-chip. The driver in use is `drivers/pci/controller/dwc/pci-imx6.c`. > > By default the PCIe controller's Link Capabilities are set to support > ASPM L0s only, although the datasheet indicates support for both L0s > and L1. I recently modified the host controller driver to set LnkCap > to indicate ASPM L1 and L0s; one controller is attached to an SSD that > indicates support for L1s, so changing the host controller LnkCap > allowed the `aspm_enabled` bitmask to become non-zero and the ASPM > bits to be set in the LnkCtrl registers on the host and the SSD. This > is the good news. > > The bad news is that after the LnkCap change, when I attempt to > suspend-to-RAM, I get an asynchronous exception (an SError to be > specific) at the CPU, resulting in a kernel panic. Here's the tail end > of the pstore log that was captured from the exception and resulting > panic: > ``` > <6>[ 90.850564] psci: CPU3 killed (polled 4 ms) > <4>[ 90.851989] IRQ 75: no longer affine to CPU4 > <4>[ 90.852062] IRQ384: set affinity failed(-22). > <4>[ 90.852065] IRQ386: set affinity failed(-22). > <5>[ 90.852112] CPU4: shutdown > <6>[ 90.853135] psci: CPU4 killed (polled 0 ms) > <4>[ 90.856402] IRQ384: set affinity failed(-22). > <4>[ 90.856409] Eff. affinity 3-5 of IRQ 386 contains only offline > CPUs after offlining CPU 5 > <2>[ 90.856536] SError Interrupt on CPU5, code 0xbf000002 -- SError > <4>[ 90.856538] CPU: 5 PID: 37 Comm: migration/5 Tainted: G > C O 5.10.72-lts-5.10.y+g886272a218dd #1 > <4>[ 90.856539] Hardware name: Freescale i.MX8QM, Rivian TCM Board X1 (DT) > <4>[ 90.856541] pstate: 00000085 (nzcv daIf -PAN -UAO -TCO BTYPE=--) > <4>[ 90.856542] pc : msi_set_mask_bit.isra.0+0x40/0x90 > <4>[ 90.856544] lr : msi_set_mask_bit.isra.0+0x34/0x90 > <4>[ 90.856545] sp : ffff800012743bf0 > <4>[ 90.856546] x29: ffff800012743bf0 x28: 0000000000000005 > <4>[ 90.856550] x27: ffff000806260218 x26: ffff8000121c9260 > <4>[ 90.856553] x25: ffff800011d1b000 x24: ffff8000121c9480 > <4>[ 90.856556] x23: ffff80001225a768 x22: ffff000806260260 > <4>[ 90.856559] x21: ffff0008062602dc x20: ffff000806260200 > <4>[ 90.856562] x19: ffff00080286b200 x18: 0000000000000020 > <4>[ 90.856565] x17: 0000000000000001 x16: 0000000000000019 > <4>[ 90.856567] x15: ffff0008023812f8 x14: 6e696e696c66666f > <4>[ 90.856570] x13: 2072657466612073 x12: 55504320656e696c > <4>[ 90.856572] x11: 66666f20796c6e6f x10: 20736e6961746e6f > <4>[ 90.856575] x9 : ffff8000106aa084 x8 : 4920666f20352d33 > <4>[ 90.856578] x7 : 207974696e696666 x6 : 0000000000000000 > <4>[ 90.856581] x5 : 0000000000000000 x4 : 0000000000000000 > <4>[ 90.856584] x3 : 0000000000000001 x2 : ffff800012d6302c > <4>[ 90.856586] x1 : 0000000000000001 x0 : 0000000000000000 > <0>[ 90.856589] Kernel panic - not syncing: Asynchronous SError Interrupt > <4>[ 90.856591] CPU: 5 PID: 37 Comm: migration/5 Tainted: G > C O 5.10.72-lts-5.10.y+g886272a218dd #1 > <4>[ 90.856592] Hardware name: Freescale i.MX8QM > <4>[ 90.856594] Call trace: > <4>[ 90.856595] dump_backtrace+0x0/0x1b0 > <4>[ 90.856596] show_stack+0x24/0x30 > <4>[ 90.856597] dump_stack+0xd0/0x12c > <4>[ 90.856598] panic+0x178/0x380 > <4>[ 90.856600] nmi_panic+0x98/0xa0 > <4>[ 90.856601] arm64_serror_panic+0x8c/0x98 > <4>[ 90.856602] do_serror+0x64/0x6c > <4>[ 90.856603] el1_error+0x90/0x110 > <4>[ 90.856604] msi_set_mask_bit.isra.0+0x40/0x90 > <4>[ 90.856605] pci_msi_mask_irq+0x2c/0x40 > <4>[ 90.856606] dw_msi_mask_irq+0x24/0x40 > <4>[ 90.856607] irq_shutdown+0xa4/0xe0 > <4>[ 90.856609] irq_shutdown_and_deactivate+0x24/0x3c > <4>[ 90.856610] irq_migrate_all_off_this_cpu+0x260/0x290 > <4>[ 90.856611] __cpu_disable+0xd8/0xf0 > <4>[ 90.856612] take_cpu_down+0x48/0xf0 > <4>[ 90.856613] multi_cpu_stop+0xb4/0x1a0 > <4>[ 90.856614] cpu_stopper_thread+0xa0/0x130 > <4>[ 90.856615] smpboot_thread_fn+0x25c/0x290 > <4>[ 90.856616] kthread+0x164/0x16c > <4>[ 90.856617] ret_from_fork+0x10/0x30 > <2>[ 91.856635] SMP: stopping secondary CPUs > <4>[ 91.856636] SMP: failed to stop secondary CPUs 0 > <0>[ 91.856637] Kernel Offset: disabled > <0>[ 91.856639] CPU features: 0x0240022,2100600c > <0>[ 91.856640] Memory Limit: none > ``` > > Note that the exception occurs right around when we're masking MSIs, > which happens because CPU5 goes offline and the kernel recognizes that > there are no more CPUs left to handle those interrupts. To be a little > more specific, the PC indicates that the CPU was executing a data > memory barrier when the exception arrived. Source here [1] and > disassembly below with my annotation showing the PC when the exception > hit: > > ``` > 1b64: b9005260 str w0, [x19, #80] > asm volatile(ALTERNATIVE("ldr %w0, [%1]", > 1b68: f9403260 ldr x0, [x19, #96] > 1b6c: b9400000 ldr w0, [x0] > readl(desc->mask_base); /* Flush write to > device */ > 1b70: d5033dbf dmb ld <--- PC during exception > 1b74: 2a0003e0 mov w0, w0 > 1b78: ca000000 eor x0, x0, x0 > ``` > > At the point of the exception I believe that the PCIe controller is > powered down, although I haven't confirmed. > > - - - > > I'm trying to understand what's going wrong but I've hit a wall. I > thought the act of writing to the MSI_MASK bit while the HC is powered > down is what caused the issue, but I hacked a fix to avoid calling > `msi_set_mask_bit()` during suspend-to-RAM and still got the > exception. > > At this point I'm wondering why we mask MSIs so late in the suspend > process (right when the last non-boot CPU is taken offline). Shouldn't > we disable/mask these IRQs as part of host controller suspend? > > I'm also wondering if maybe the PCIe _device_ - the SSD - is writing > into host controller memory after the HC is powered down, which could > cause an exception on the data bus that would obviously be > asynchronous to the CPU. But I've just started learning about PCIe and > I have a very fuzzy understanding of how data flows during an MSI. > > Any advice would be welcome. > > Thanks, > Kevin > > [1] https://source.codeaurora.org/external/imx/linux-imx/tree/drivers/pci/msi.c?h=lf-5.10.y#n243