Re: [PATCH v9 6/9] PCI/bwctrl: Re-add BW notification portdrv as PCIe BW controller

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2024-12-06 at 20:31 +0100, Niklas Schnelle wrote:
> On Fri, 2024-12-06 at 19:12 +0100, Niklas Schnelle wrote:
> > On Fri, 2024-10-18 at 17:47 +0300, Ilpo Järvinen wrote:
> > > This mostly reverts the commit b4c7d2076b4e ("PCI/LINK: Remove
> > > bandwidth notification"). An upcoming commit extends this driver
> > > building PCIe bandwidth controller on top of it.
> > > 
> > > The PCIe bandwidth notification were first added in the commit
> > > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > > notification") but later had to be removed. The significant changes
> > > compared with the old bandwidth notification driver include:
> > > 
> ---8<---
> > > ---
> > 
> > Hi Ilpo,
> > 
> > I bisected a v6.13-rc1 boot hang on my personal workstation to this
> > patch. Sadly I don't have much details like a panic or so because the
> > boot hangs before any kernel messages, or at least they're not visible
> > long enough to see. I haven't yet looked into the code as I wanted to
> > raise awareness first. Since the commit doesn't revert cleanly on
> > v6.13-rc1 I also haven't tried that yet.
> > 
> > Here are some details on my system:
> > - AMD Ryzen 9 3900X 
> > - ASRock X570 Creator Motherboard
> > - Radeon RX 5600 XT
> > - Intel JHL7540 Thunderbolt 3 USB Controller (only USB 2 plugged)
> > - Intel 82599 10 Gigabit NIC with SR-IOV enabled with 2 VFs
> > - Intel n I211 Gigabit NIC
> > - Intel Wi-Fi 6 AX200
> > - Aquantia AQtion AQC107 NIC
> > 
> > If you have patches or things to try just ask.
> > 
> > Thanks,
> > Niklas
> > 
> 
> Ok I can now at least confirm that bluntly disabling the new bwctrl
> driver with the below diff on top of v6.13-rc1 circumvents the boot
> hang I'm seeing. So it's definitely this.
> 
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 5e10306b6308..6fa54480444a 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -828,7 +828,7 @@ static void __init pcie_init_services(void)
>         pcie_aer_init();
>         pcie_pme_init();
>         pcie_dpc_init();
> -       pcie_bwctrl_init();
> +       /* pcie_bwctrl_init(); */
>         pcie_hp_init();
>  }
> 

Also here is the full lspci -vvv output running the above on v6.13-rc1:
https://paste.js.org/9UwQIMp7eSgp

Also note that I have CONFIG_PCIE_THERMAL unset so it's also not the
cooling device portion that's causing the issue. Next I guess I should
narrow it down to the specific port where enabling the bandwidth
monitoring is causing trouble, not yet sure how best to do this with
this many devices.

Thanks,
Niklas





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux