Re: [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset

Lukas Wunner <lukas@xxxxxxxxx> · Tue, 18 Feb 2025 09:59:25 +0100

On Mon, Feb 17, 2025 at 06:52:58PM +0200, Ilpo Järvinen wrote:
> PCIe BW controller enables BW notifications for Downstream Ports by
> setting Link Bandwidth Management Interrupt Enable (LBMIE) and Link
> Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe Spec. r6.2 sec.
> 7.5.3.7).
> 
> It was discovered that performing a reset can lead to the device
> underneath the Downstream Port becoming unavailable if BW notifications
> are left enabled throughout the reset sequence (at least LBMIE was
> found to cause an issue).

What kind of reset?  FLR?  SBR?  This needs to be specified in the
commit message so that the reader isn't forced to sift through a
bugzilla with dozens of comments and attachments.

The commit message should also mention the type of affected device
(Nvidia GPU AD107M [GeForce RTX 4050 Max-Q / Mobile]).  The Root Port
above is an AMD one, that may be relevant as well.

> While the PCIe Specifications do not indicate BW notifications could not
> be kept enabled during resets, the PCIe Link state during an
> intentional reset is not of large interest. Thus, disable BW controller
> for the bridge while reset is performed and re-enable it after the
> reset has completed to workaround the problems some devices encounter
> if BW notifications are left on throughout the reset sequence.

This approach won't work if the reset is performed without software
intervention.  E.g. if a DPC event occurs, the device likewise undergoes
a reset but there is no prior system software involvement.  Software only
becomes involved *after* the reset has occurred.

I think it needs to be tested if that same issue occurs with DPC.
It's easy to simulate DPC by setting the Software Trigger bit:

setpci -s 00:01.1 ECAP_DPC+6.w=40:40

If the issue does occur with DPC then this fix isn't sufficient.

> Keep a counter for the disable/enable because MFD will execute
> pci_dev_save_and_disable() and pci_dev_restore() back to back for
> sibling devices:
> 
> [   50.139010] vfio-pci 0000:01:00.0: resetting
> [   50.139053] vfio-pci 0000:01:00.1: resetting
> [   50.141126] pcieport 0000:00:01.1: PME: Spurious native interrupt!
> [   50.141133] pcieport 0000:00:01.1: PME: Spurious native interrupt!
> [   50.441466] vfio-pci 0000:01:00.0: reset done
> [   50.501534] vfio-pci 0000:01:00.1: reset done

So why are you citing the PME messages here?  Are they relevant?
Do they not occur when the bandwidth controller is disabled?
If they do not, they may provide a clue what's going on.
But that's not clear from the commit message.

> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -5166,6 +5167,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
>  	 */
>  	pci_set_power_state(dev, PCI_D0);
>  
> +	if (bridge)
> +		pcie_bwnotif_disable(bridge);
> +
>  	pci_save_state(dev);

Instead of putting this in the PCI core, amend pcie_portdrv_err_handler
with ->reset_prepare and ->reset_done callbacks which call down to all
the port service drivers, then amend bwctrl.c to disable/enable
interrupts in these callbacks.

> +	port->link_bwctrl->disable_count--;
> +	if (!port->link_bwctrl->disable_count) {
> +		__pcie_bwnotif_enable(port);
> +		pci_dbg(port, "BW notifications enabled\n");
> +	}
> +	WARN_ON_ONCE(port->link_bwctrl->disable_count < 0);

So why do you need to count this?  IIUC you get two consecutive
disable and two consecutive enable events.

If the interrupts are already disabled, just do nothing.
Same for enablement.  Any reason this simpler approach
doesn't work?

Thanks,

Lukas