On Tue, Jan 12, 2021 at 06:20:55PM +0100, Hinko Kocevar wrote: > On 1/11/21 11:09 PM, Keith Busch wrote: > > Here is the log after applying the patch. > > What sticks out are the numerous "VC buffer not found in pci_save_vc_state" > messages, AFAICT from vc.c pci_save_vc_state(), which I have not spotted > before: > > [dev@bd-cpu18 ~]$ dmesg | grep vc_ > [ 336.960749] pcieport 0000:00:01.1: VC buffer not found in pci_save_vc_state > [ 338.125683] pcieport 0000:01:00.0: VC buffer not found in pci_save_vc_state > [ 338.342504] pcieport 0000:02:01.0: VC buffer not found in pci_save_vc_state > [ 338.569035] pcieport 0000:03:00.0: VC buffer not found in pci_save_vc_state > [ 338.775696] pcieport 0000:04:01.0: VC buffer not found in pci_save_vc_state > [ 338.982599] pcieport 0000:04:03.0: VC buffer not found in pci_save_vc_state > [ 339.189608] pcieport 0000:04:08.0: VC buffer not found in pci_save_vc_state > [ 339.406232] pcieport 0000:04:0a.0: VC buffer not found in pci_save_vc_state > [ 339.986434] pcieport 0000:04:12.0: VC buffer not found in pci_save_vc_state Ah, that's happening because I added the cap caching after the cap buffer allocation. The patch below on top of the previous should fix those warnings. > I do not see the lockup anymore, and the recovery seems to have successfully > been performed. Okay, that kind of indicates the frequent capability lookups are taking a while. We cache other capability offsets for similar reasons in the past, but I don't recall them ever taking so long that it triggers the CPU lockup watchdog. --- diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 56992a42bac6..a12efa87c7e0 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -2385,6 +2385,7 @@ static void pci_init_capabilities(struct pci_dev *dev) pci_ea_init(dev); /* Enhanced Allocation */ pci_msi_init(dev); /* Disable MSI */ pci_msix_init(dev); /* Disable MSI-X */ + pci_vc_init(dev); /* Virtual Channel */ /* Buffers for saving PCIe and PCI-X capabilities */ pci_allocate_cap_save_buffers(dev); @@ -2401,7 +2402,6 @@ static void pci_init_capabilities(struct pci_dev *dev) pci_aer_init(dev); /* Advanced Error Reporting */ pci_dpc_init(dev); /* Downstream Port Containment */ pci_rcec_init(dev); /* Root Complex Event Collector */ - pci_vc_init(dev); /* Virtual Channel */ pcie_report_downtraining(dev); --