We have an out-of-tree driver for this device, but to eliminate the driver's role in this issue, I renamed the driver to prevent it from loading automatically after rebooting the machine. Despite not using the driver, the issue still occurred. > > > $ sudo lspci -xxx -s 01:00.0 | grep "10:" > > 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > > > > I am not sure why lspci initially showed all FF's and then the next > > run showed BAR0 reset. > > Complete sudo lspci -xxx -s 01:00.0 output is captured in the attached > > dmesg_log_pci_bar_reset.txt file. > > > > /sys/firmware/acpi/interrupts/gpe01: 1 EN enabled unmasked > > /sys/firmware/acpi/interrupts/gpe02: 1 EN enabled unmasked > > > > > > 5. Debugging Steps: > > > > Instrumenting acpiphp_check_bridge() will indicate whether we are > > enabling or disabling a slot (enable_slot() or disable_slot()). Based > > on the dmesg log, there is only one ACPI_NOTIFY_BUS_CHECK event, and > > it is most likely for disable_slot(). However, does instrumenting > > acpiphp_check_bridge() will explain why this is happening without > > actually removing the PCI PLDA device? > > No, it won't explain that. But if there was no add/remove event, > re-enumeration should be harmless. The objective of instrumentation > would be to figure out why it isn't harmless in this case. > > > 6. Reproduction and Additional Information: > > > > We do not see any clear pattern or procedure to reproduce this issue. > > Once the issue occurs, rebooting the machine resolves it, but it > > reoccurs after an unpredictable time. > > We have another identical hardware setup with an older kernel (Ubuntu > > 16.04.4 LTS, kernel version: 4.4.0-66-generic), and this issue has not > > been observed so far on that machine. > > Any additional pointers or suggestions on how to proceed to the root > > cause of this issue would be greatly appreciated. > > You're seeing the problem on v5.4 (Nov 2019), which is much newer than > v4.4 (Jan 2016). But v5.4 is still really too old to spend a lot of > time on unless the problem still happens on a current kernel. > > Bjorn