Re: [PATCH v2] PCI: pciehp: Clear LBMS on hot-remove to prevent link speed reduction

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 17 Jul 2024, Smita Koralahalli wrote:

> On 7/9/2024 3:52 AM, Ilpo Järvinen wrote:
> > On Tue, 25 Jun 2024, Smita Koralahalli wrote:
> > 
> > > Sorry for the delay here. Took some time to find a system to run
> > > experiments.
> > > Comments inline.
> > > 
> > > On 6/19/2024 12:47 AM, Lukas Wunner wrote:
> > > > On Tue, Jun 18, 2024 at 02:23:21PM -0700, Smita Koralahalli wrote:
> > > > > On 6/18/2024 11:51 AM, Smita Koralahalli wrote:
> > > > > > > > > But IIUC LBMS is set by hardware but never cleared by
> > > > > > > > > hardware, so
> > > > > > > > > if
> > > > > > > > > we remove a device and power off the slot, it doesn't seem
> > > > > > > > > like
> > > > > > > > > LBMS
> > > > > > > > > could be telling us anything useful (what could we do in
> > > > > > > > > response
> > > > > > > > > to
> > > > > > > > > LBMS when the slot is empty?), so it makes sense to me to
> > > > > > > > > clear
> > > > > > > > > it.
> > > > > > > > > 
> > > > > > > > > It seems like pciehp_unconfigure_device() does sort of PCI
> > > > > > > > > core
> > > > > > > > > and
> > > > > > > > > driver-related things and possibly could be something shared
> > > > > > > > > by
> > > > > > > > > all
> > > > > > > > > hotplug drivers, while remove_board() does things more
> > > > > > > > > specific to
> > > > > > > > > the
> > > > > > > > > hotplug model (pciehp, shpchp, etc).
> > > > > > > > > 
> > > > > > > > >   From that perspective, clearing LBMS might fit better in
> > > > > > > > > remove_board(). In that case, I wonder whether it should be
> > > > > > > > > done
> > > > > > > > > after turning off slot power? This patch clears is *before*
> > > > > > > > > turning
> > > > > > > > > off the power, so I wonder if hardware could possibly set it
> > > > > > > > > again
> > > > > > > > > before the poweroff?
> > > > > 
> > > > > While clearing LBMS in remove_board() here:
> > > > > 
> > > > > if (POWER_CTRL(ctrl)) {
> > > > > 	pciehp_power_off_slot(ctrl);
> > > > > +	pcie_capability_write_word(ctrl->pcie->port, PCI_EXP_LNKSTA,
> > > > > 				   PCI_EXP_LNKSTA_LBMS);
> > > > > 
> > > > > 	/*
> > > > > 	 * After turning power off, we must wait for at least 1 second
> > > > > 	 * before taking any action that relies on power having been
> > > > > 	 * removed from the slot/adapter.
> > > > > 	 */
> > > > > 	msleep(1000);
> > > > > 
> > > > > 	/* Ignore link or presence changes caused by power off */
> > > > > 	atomic_and(~(PCI_EXP_SLTSTA_DLLSC | PCI_EXP_SLTSTA_PDC),
> > > > > 		   &ctrl->pending_events);
> > > > > }
> > > > > 
> > > > > This can happen too right? I.e Just after the slot poweroff and before
> > > > > LBMS
> > > > > clearing the PDC/PDSC could be fired. Then
> > > > > pciehp_handle_presence_or_link_change() would hit case "OFF_STATE" and
> > > > > proceed with pciehp_enable_slot() ....pcie_failed_link_retrain() and
> > > > > ultimately link speed drops..
> > > > > 
> > > > > So, I added clearing just before turning off the slot.. Let me know if
> > > > > I'm
> > > > > thinking it right.
> > > 
> > > I guess I should have experimented before putting this comment out.
> > > 
> > > After talking to the HW/FW teams, I understood that, none of our CRBs
> > > support
> > > power controller for NVMe devices, which means the "Power Controller
> > > Present"
> > > in Slot_Cap is always false. That's what makes it a "surprise removal." If
> > > the
> > > OS was notified beforehand and there was a power controller attached, the
> > > OS
> > > would turn off the power with SLOT_CNTL. That's an "orderly" removal. So
> > > essentially, the entire block from "if (POWER_CTRL(ctrl))" will never be
> > > executed for surprise removal for us.
> > > 
> > > There could be board designs outside of us, with power controllers for the
> > > NVME devices, which I'm not aware of.
> > > > 
> > > > This was added by 3943af9d01e9 ("PCI: pciehp: Ignore Link State Changes
> > > > after powering off a slot").  You can try reproducing it by writing "0"
> > > > to the slot's "power" file in sysfs, but your hardware needs to support
> > > > slot power.
> > > > 
> > > > Basically the idea is that after waiting for 1 sec, chances are very low
> > > > that any DLLSC or PDSC events caused by removing slot power may still
> > > > occur.
> > > 
> > > PDSC events occurring in our case aren't by removing slot power. It
> > > should/will always happen on a surprise removal along with DLLSC for us.
> > > But
> > > this PDSC is been delayed and happens after DLLSC is invoked and
> > > ctrl->state =
> > > OFF_STATE in pciehp_disable_slot(). So the PDSC is mistook to enable slot
> > > in
> > > pciehp_enable_slot() inside pciehp_handle_presence_or_link_change().
> > > > 
> > > > Arguably the same applies to LBMS changes, so I'd recommend to likewise
> > > > clear stale LBMS after the msleep(1000).
> > > > 
> > > > pciehp_ctrl.c only contains the state machine and higher-level logic of
> > > > the hotplug controller and all the actual register accesses are in
> > > > helpers
> > > > in pciehp_hpc.c.  So if you want to do it picture-perfectly, add a
> > > > helper
> > > > in pciehp_hpc.c to clear LBMS and call that from remove_board().
> > > > 
> > > > That all being said, I'm wondering how this plays together with Ilpo's
> > > > bandwidth control driver?
> > > > 
> > > > https://lore.kernel.org/all/20240516093222.1684-1-ilpo.jarvinen@xxxxxxxxxxxxxxx/
> > > 
> > > I need to yet do a thorough reading of Ilpo's bandwidth control driver.
> > > Ilpo
> > > please correct me if I misspeak something as I don't have a thorough
> > > understanding.
> > > 
> > > Ilpo's bandwidth controller also checks for lbms count to be greater than
> > > zero
> > > to bring down link speeds if CONFIG_PCIE_BWCTRL is true. If false, it
> > > follows
> > > the default path to check LBMS bit in link status register. So if,
> > > CONFIG_PCIE_BWCTRL is disabled by default we continue to see link speed
> > > drops.
> > > Even, if BWCTRL is enabled, LBMS count is incremented to 1 in
> > > pcie_bwnotif_enable() so likely pcie_lbms_seen() might return true thereby
> > > bringing down speeds here as well if DLLLA is clear?
> > 
> > I did add code to clear the LBMS count in pciehp_unconfigure_device() in
> > part thanks to this patch of yours. Do you think it wouldn't work?
> 
> It works with BWCTRL enabled. Just my concern would be to keep the clearing in
> pciehp_unconfigure_device() and not do it inside POWER_CTRL(ctrl), in
> remove_board() as per the suggestions given above.

I added that LBMS count reset based on one of the first versions of your 
patch which had the LBMS clearing in pciehp_unconfigure_device() before 
this discussion went further. The point anyway is that bwctrl change would 
replace the line you put in to clear LBMS, wherever that will be placed 
in the end.

> > But I agree there would still be problem if BWCTRL is not enabled. I
> > already have to keep part of it enabled due to the Target Speed quirk
> > and now this is another case where just having it always on would be
> > beneficial.
> 
> Correct, it should always be on to not see the problem.
>
> Would like to have this "LBMS clearing fix" accepted in sooner since its
> breaking things on our systems! :)

Sure, I've no objection to that. In fact, it's what I prefer myself too 
because it makes things easier for stable folks. :-)

I'd also like to see those Target Speed quick fixes to go in before
bwctl (to the extent I did base my series on top of them) but it seems 
Maciej might not be updating them so perhaps I'll have to take look at 
Bjorn's comments on that series and see what can be done to those patches.

-- 
 i.

[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux