[+cc linux-acpi] On Wed, Feb 19, 2025 at 05:52:47PM +0530, Naveen Kumar P wrote: > Hi all, > > I am writing to seek assistance with an issue we are experiencing with > a PCIe device (PLDA Device 5555) connected through PCI Express Root > Port 1 to the host bridge. > > We have observed that after booting the system, the Base Address > Register (BAR0) memory of this device gets reset to 0x0 after > approximately one hour or more (the timing is inconsistent). This was > verified using the lspci output and the setpci -s 01:00.0 > BASE_ADDRESS_0 command. > > To diagnose the issue, we checked the dmesg log, but it did not > provide any relevant information. I then enabled dynamic debugging for > the PCI subsystem (drivers/pci/*) and noticed the following messages > related ACPI hotplug in the dmesg log: > > [ 0.465144] pci 0000:01:00.0: reg 0x10: [mem 0xb0400000-0xb07fffff] > ... > [ 6710.000355] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [ 7916.250868] perf: interrupt took too long (4072 > 3601), lowering > kernel.perf_event_max_sample_rate to 49000 > [ 7984.719647] perf: interrupt took too long (5378 > 5090), lowering > kernel.perf_event_max_sample_rate to 37000 > [11051.409115] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [11755.388727] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [12223.885715] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > [14303.465636] ACPI: \_SB_.PCI0.RP01: acpiphp_glue: Bus check in hotplug_event() > After these messages appear, reading the device BAR memory results in > 0x0 instead of the expected value. > > I would like to understand the following: > > 1. What could be causing these hotplug_event debug messages? This is an ACPI Notify event. Basically the platform is telling us to re-enumerate the hierarchy below RP01 because a device might have been added or removed. Unfortunately the only real information we get is the ACPI device (RP01) and the notification value (ACPI_NOTIFY_BUS_CHECK). You could instrument acpiphp_check_bridge() to see what path we take. The main paths look like enable_slot() or disable_slot(), but those both include a pr_debug() than you apparently don't see. A remove followed by add would definitely reset the device, including its BARs. But you would normally see some messages related to enumerating a new device. If this doesn't help, try to reproduce the problem with a recent kernel, e.g., v6.13, and post the complete dmesg log. > 2. Why does this result in the BAR memory being reset? > 3. How can we resolve this issue? > > I have verified that the issue occurs even without loading the driver > for the PLDA Device 5555, so it does not appear to be related to the > device driver. > > Any help or guidance on debugging this issue would be greatly appreciated. > > Thank you for your assistance. > > Best regards, > Naveen