Re: [PATCH 1/1] PCI/bwctrl: Disable PCIe BW controller during reset

Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> · Mon, 24 Feb 2025 17:13:15 +0200 (EET)

On Tue, 18 Feb 2025, Lukas Wunner wrote:

> On Mon, Feb 17, 2025 at 06:52:58PM +0200, Ilpo Järvinen wrote:
> > PCIe BW controller enables BW notifications for Downstream Ports by
> > setting Link Bandwidth Management Interrupt Enable (LBMIE) and Link
> > Autonomous Bandwidth Interrupt Enable (LABIE) (PCIe Spec. r6.2 sec.
> > 7.5.3.7).
> > 
> > It was discovered that performing a reset can lead to the device
> > underneath the Downstream Port becoming unavailable if BW notifications
> > are left enabled throughout the reset sequence (at least LBMIE was
> > found to cause an issue).
> 
> What kind of reset?  FLR?  SBR?  This needs to be specified in the
> commit message so that the reader isn't forced to sift through a
> bugzilla with dozens of comments and attachments.

Heh, I never really tried to figure out it because the reset disable 
patch was just a stab into the dark style patch. To my surprise, it ended 
up working (after the initial confusion was resolved) and I just started 
to prepare this patch from that knowledge.

Logs do mention this:

[   21.560206] pcieport 0000:00:01.1: unlocked secondary bus reset via: pciehp_reset_slot+0x98/0x140

...so it seems to be SBR.

> The commit message should also mention the type of affected device
> (Nvidia GPU AD107M [GeForce RTX 4050 Max-Q / Mobile]).  The Root Port
> above is an AMD one, that may be relevant as well.

Okay.

> > While the PCIe Specifications do not indicate BW notifications could not
> > be kept enabled during resets, the PCIe Link state during an
> > intentional reset is not of large interest. Thus, disable BW controller
> > for the bridge while reset is performed and re-enable it after the
> > reset has completed to workaround the problems some devices encounter
> > if BW notifications are left on throughout the reset sequence.
> 
> This approach won't work if the reset is performed without software
> intervention.  E.g. if a DPC event occurs, the device likewise undergoes
> a reset but there is no prior system software involvement.  Software only
> becomes involved *after* the reset has occurred.
> 
> I think it needs to be tested if that same issue occurs with DPC.
> It's easy to simulate DPC by setting the Software Trigger bit:
> 
> setpci -s 00:01.1 ECAP_DPC+6.w=40:40
> 
> If the issue does occur with DPC then this fix isn't sufficient.

Looking into lspci logs, I don't see DPC capability being there for 
00:01.1?!

> > Keep a counter for the disable/enable because MFD will execute
> > pci_dev_save_and_disable() and pci_dev_restore() back to back for
> > sibling devices:
> > 
> > [   50.139010] vfio-pci 0000:01:00.0: resetting
> > [   50.139053] vfio-pci 0000:01:00.1: resetting
> > [   50.141126] pcieport 0000:00:01.1: PME: Spurious native interrupt!
> > [   50.141133] pcieport 0000:00:01.1: PME: Spurious native interrupt!
> > [   50.441466] vfio-pci 0000:01:00.0: reset done
> > [   50.501534] vfio-pci 0000:01:00.1: reset done
> 
> So why are you citing the PME messages here?  Are they relevant?
> Do they not occur when the bandwidth controller is disabled?
> If they do not, they may provide a clue what's going on.
> But that's not clear from the commit message.

They do occur also when BW controller is disabled for the duration of 
reset. What I don't currently have at hand is a comparable log from era 
prior to any BW controller commits (from 6.12). The one currently in 
bugzilla has the out-of-tree module loaded.

PME and BW notifications do share the interrupt so I'd not entirely 
discount their relevance though, and one of my theories (never mentioned 
anywhere until now) was that the extra interrupts that come due to BW 
notifications somehow manage to confuse PME driver. But I never found 
anything to that effect.

I can remove the PME lines though.

> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -5166,6 +5167,9 @@ static void pci_dev_save_and_disable(struct pci_dev *dev)
> >  	 */
> >  	pci_set_power_state(dev, PCI_D0);
> >  
> > +	if (bridge)
> > +		pcie_bwnotif_disable(bridge);
> > +
> >  	pci_save_state(dev);
> 
> Instead of putting this in the PCI core, amend pcie_portdrv_err_handler
> with ->reset_prepare and ->reset_done callbacks which call down to all
> the port service drivers, then amend bwctrl.c to disable/enable
> interrupts in these callbacks.

Will it work? I mean if the port itself is not reset (0000:00:01.1 in this 
case), do these callbacks get called for it?

> > +	port->link_bwctrl->disable_count--;
> > +	if (!port->link_bwctrl->disable_count) {
> > +		__pcie_bwnotif_enable(port);
> > +		pci_dbg(port, "BW notifications enabled\n");
> > +	}
> > +	WARN_ON_ONCE(port->link_bwctrl->disable_count < 0);
> 
> So why do you need to count this?  IIUC you get two consecutive
> disable and two consecutive enable events.
> 
> If the interrupts are already disabled, just do nothing.
> Same for enablement.  Any reason this simpler approach
> doesn't work?

If enabling on restore of the first dev, BW notifications would get 
enabled before the last of the devices has been restored. While it might 
work, the test patch, after all, did work without this complexity, IMO it 
seems unwise to create such a racy pattern.

For disable side, I think it would be possible to always disable 
unconditionally.

-- 
 i.