Re: [PATCH v2] PCI/ASPM: Disable L1 before disabling L1ss

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 03, 2024 at 12:01:22PM -0500, Bjorn Helgaas wrote:
> On Thu, Oct 03, 2024 at 06:55:03PM +0530, Ajay Agarwal wrote:
> > The current sequence in the driver for L1ss update is as follows.
> > 
> > Disable L1ss
> > Disable L1
> > Enable L1ss as required
> > Enable L1 if required
> > 
> > With this sequence, a bus hang is observed during the L1ss
> > disable sequence when the RC CPU attempts to clear the RC L1ss
> > register after clearing the EP L1ss register.
> 
> Thanks for this.  What exactly does the bus hang look like to a user?
>
The CPU is just hung on reading the RC PCI_L1SS_CTL1 register. After
some time, the CPU watchdog expires and the system reboots.

> I guess the problem happens in pcie_config_aspm_l1ss(), where we do:
> 
>   pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
>   pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
> 
> where clearing the child (endpoint) PCI_L1SS_CTL1_L1_2_MASK works, but
> something goes wrong when clearing the parent (RP) mask?  The
> clear_and_set will do a read followed by a write, and one of those
> causes some kind of error?
>
During ASPM disable, in pcie_config_aspm_l1ss(), we do:
   1. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)
   2. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
   3. pci_clear_and_set_config_dword(parent->l1ss + PCI_L1SS_CTL1, ... 0)
   4. pci_clear_and_set_config_dword(child->l1ss + PCI_L1SS_CTL1, ... 0)

We observe that the steps 1 and 2 go through just fine. But the read of
PCI_L1SS_CTL1 register in the step 3 hangs. I am not sure why.
The issue is pretty difficult to reproduce, and adding prints around
these steps masks the issue.

> > It looks like the
> > RC attempts to enter L1ss again and at the same time, access to
> > RC L1ss register fails because aux clk is still not active.
> 
> I assume "access to RC L1ss register fails" means something like
> "reading the Root Port PCI_L1SS_CTL1 register returns ~0" which I
> guess would be the read part of the pci_clear_and_set_config_dword()?
> 
> ~0 data might be returned because of some PCIe error like Unsupported
> Request, Completion Timeout, etc?  Such an error should be logged in
> the AER Capability.
>
This is not a PCIe bus transaction. This is CPU on the RC side accessing
the RC side config register, so the link is not involved at all. Hence,
no timeout or other AER errors logged/reported. The AXI-DBI bus just
hangs.

> This *sounds* like it would be a hardware defect in the Root Port.
> This register is on the upstream end of the link, so I would think it
> would be readable no matter what state the link is in.
> 
Exactly. As described above, this is not a PCIe transaction.

> Sec 5.5.4 requires that L1 be disabled in PCI_EXP_LNKCTL while
> *setting* either of the ASPM L1 PM Substates enable bits.  I don't see
> anything there about requiring that for *clearing* those enable bits.
> But maybe it is required, and in any event I guess it's simpler to do
> it as you do here and have L1 (indeed *all* ASPM) disabled while
> configuring L1 SS.
> 
Right. The spec does not talk about the sequence when one wants to clear
these L1ss bits. But I am interpreting the word "setting" as "setting to
1" as well as "setting to 0".

> > PCIe spec r6.2, section 5.5.4, recommends that setting either
> > or both of the enable bits for ASPM L1 PM Substates must be done
> > while ASPM L1 is disabled. My interpretation here is that
> > clearing L1ss should also be done when L1 is disabled. Thereby,
> > change the sequence as follows.
> > 
> > Disable L1
> > Disable L1ss
> > Enable L1ss as required
> > Enable L1 if required
> > 
> > Signed-off-by: Ajay Agarwal <ajayagarwal@xxxxxxxxxx>
> > ---
> >  drivers/pci/pcie/aspm.c | 50 ++++++++++++++++++++---------------------
> >  1 file changed, 24 insertions(+), 26 deletions(-)
> > 
> > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
> > index cee2365e54b8..c172886129f3 100644
> > --- a/drivers/pci/pcie/aspm.c
> > +++ b/drivers/pci/pcie/aspm.c
> > @@ -848,17 +848,13 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
> >  /* Configure the ASPM L1 substates */
> >  static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >  {
> > -	u32 val, enable_req;
> > +	u32 val;
> >  	struct pci_dev *child = link->downstream, *parent = link->pdev;
> >  
> > -	enable_req = (link->aspm_enabled ^ state) & state;
> > -
> >  	/*
> > -	 * Here are the rules specified in the PCIe spec for enabling L1SS:
> > +	 * Spec r6.2, section 5.5.4, mentions the rules for enabling L1SS:
> >  	 * - When enabling L1.x, enable bit at parent first, then at child
> >  	 * - When disabling L1.x, disable bit at child first, then at parent
> > -	 * - When enabling ASPM L1.x, need to disable L1
> > -	 *   (at child followed by parent).
> >  	 * - The ASPM/PCIPM L1.2 must be disabled while programming timing
> >  	 *   parameters
> >  	 *
> > @@ -871,16 +867,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
> >  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
> >  	pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
> >  				       PCI_L1SS_CTL1_L1SS_MASK, 0);
> > -	/*
> > -	 * If needed, disable L1, and it gets enabled later
> > -	 * in pcie_config_aspm_link().
> > -	 */
> > -	if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
> > -		pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
> > -					   PCI_EXP_LNKCTL_ASPM_L1);
> > -		pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
> > -					   PCI_EXP_LNKCTL_ASPM_L1);
> > -	}
> >  
> >  	val = 0;
> >  	if (state & PCIE_LINK_STATE_L1_1)
> > @@ -937,21 +923,33 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
> >  		dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
> >  	}
> >  
> > +	/*
> > +	 * Spec r6.2, section 5.5.4, recommends that setting either or both of
> > +	 * the enable bits for ASPM L1 PM Substates must be done while ASPM L1
> > +	 * is disabled. So disable L1 here, and it gets enabled later after the
> > +	 * L1ss configuration has been completed.
> > +	 *
> > +	 * Spec r6.2, section 7.5.3.7, mentions that ASPM L1 must be enabled by
> > +	 * software in the Upstream component on a Link prior to enabling ASPM
> > +	 * L1 in the Downstream component on the Link. When disabling L1,
> > +	 * software must disable ASPM L1 in the Downstream component on a Link
> > +	 * prior to disabling ASPM L1 in the Upstream component on that Link.
> > +	 *
> > +	 * Spec doesn't mention L0s.
> > +	 *
> > +	 * Disable L1 and L0s here, and they get enabled later after the L1ss
> > +	 * configuration has been completed.
> > +	 */
> > +	list_for_each_entry(child, &linkbus->devices, bus_list)
> > +		pcie_config_aspm_dev(child, 0);
> > +	pcie_config_aspm_dev(parent, 0);
> > +
> >  	if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
> >  		pcie_config_aspm_l1ss(link, state);
> >  
> > -	/*
> > -	 * Spec 2.0 suggests all functions should be configured the
> > -	 * same setting for ASPM. Enabling ASPM L1 should be done in
> > -	 * upstream component first and then downstream, and vice
> > -	 * versa for disabling ASPM L1. Spec doesn't mention L0S.
> > -	 */
> > -	if (state & PCIE_LINK_STATE_L1)
> > -		pcie_config_aspm_dev(parent, upstream);
> > +	pcie_config_aspm_dev(parent, upstream);
> >  	list_for_each_entry(child, &linkbus->devices, bus_list)
> >  		pcie_config_aspm_dev(child, dwstream);
> > -	if (!(state & PCIE_LINK_STATE_L1))
> > -		pcie_config_aspm_dev(parent, upstream);
> 
> I think the reason for having pcie_config_aspm_dev(parent) both before
> and after configuring the children is because pcie_config_aspm_link()
> may be called either to enable L1 or to disable it.
> 
> I guess your change always disables ASPM completely (disabling the
> downstream (child) component first, then the upstream), and here we
> are either leaving L1 disabled or enabling it, and in either case it
> should be safe to configure the upstream (parent) component first,
> then the downstream one.
> 
> Of course, we may also enable L0s here, and AFAICS it should always be
> safe to do that in the upstream component first, followed by the
> downstream one.
> 
> Bottom line, this looks good to me, and I think it's nice that this
> removes the "parent then child" or "child then parent" logic here.
> 
Agreed with all the points.

> >  	link->aspm_enabled = state;
> >  
> > -- 
> > 2.46.1.824.gd892dcdcdd-goog
> > 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux