On Tue, Mar 04, 2025 at 10:19:07PM +0530, Naveen Kumar P wrote: > On Tue, Mar 4, 2025 at 1:35 PM Naveen Kumar P > <naveenkumar.parna@xxxxxxxxx> wrote: > ... > For this test run, I removed all three parameters (pcie_aspm=off, > pci=nomsi, and pcie_ports=on) and booted with the following kernel > command line arguments: > > cat /proc/cmdline > BOOT_IMAGE=/vmlinuz-6.13.0+ root=/dev/mapper/vg00-rootvol ro quiet > "dyndbg=file drivers/pci/* +p; file drivers/acpi/bus.c +p; file > drivers/acpi/osl.c +p" > > This time, the issue occurred earlier, at 22998 seconds. Below is the > relevant dmesg log during the ACPI_NOTIFY_BUS_CHECK event. The > complete log is attached (dmesg_march4th_log.txt). > > [22998.536705] ACPI: \_SB_.PCI0.RP01: ACPI: ACPI_NOTIFY_BUS_CHECK event > [22998.536753] ACPI: \_SB_.PCI0.RP01: ACPI: OSL: Scheduling hotplug > event 0 for deferred handling > [22998.536934] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bridge acquired in > hotplug_event() > [22998.536972] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [22998.537002] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Checking bridge in > hotplug_event() > [22998.537024] PCI READ: res=0, bus=01 dev=00 func=0 pos=0x00 len=4 > data=0x55551556 > [22998.537066] PCI READ: res=0, bus=01 dev=00 func=0 pos=0x00 len=4 > data=0x55551556 Fine again. > [22998.537094] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Enabling slot in > acpiphp_check_bridge() > [22998.537155] ACPI: Device [PXSX] status [0000000f] > [22998.537206] ACPI: Device [D015] status [0000000f] > [22998.537276] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Releasing bridge > in hotplug_event() > > sudo lspci -xxx -s 01:00.0 | grep 10: > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Obviously a problem. Can you start including the whole "lspci -x -s 01:00.0" output? Obviously the Vendor ID reads above worked fine. I *assume* it's still fine here, and only the BARs are zeroed out? I assume you saw no new dmesg logs about config accesses to the device before the lspci. If you instrumented the user config accessors (pci_user_read_config_*(), also in access.c), you should see those accesses. You could sprinkle some calls to early_dump_pci_device() through the acpiphp path. Turn off the kernel config access tracing when you do this so it doesn't clutter things up. What is this device? Is it a shipping product? Do you have good confidence that the hardware is working correctly? I guess you said it works correctly on a different machine with an older kernel. I would swap the cards between machines in case one card is broken. You could try bisecting between the working kernel and the broken one. It's kind of painful since it takes so long to reproduce the problem. Bjorn