On Thu, Aug 15, 2024 at 05:47:17PM -0500, Bjorn Helgaas wrote: > [+cc Vidya, Jon since tegra194 does similar things] > > On Mon, Jul 29, 2024 at 05:52:45PM +0530, Manivannan Sadhasivam wrote: > > Currently, the endpoint cleanup function dw_pcie_ep_cleanup() and EPF > > deinit notify function pci_epc_deinit_notify() are called during the > > execution of qcom_pcie_perst_assert() i.e., when the host has asserted > > PERST#. But quickly after this step, refclk will also be disabled by the > > host. > > > > All of the Qcom endpoint SoCs supported as of now depend on the refclk from > > the host for keeping the controller operational. Due to this limitation, > > any access to the hardware registers in the absence of refclk will result > > in a whole endpoint crash. Unfortunately, most of the controller cleanups > > require accessing the hardware registers (like eDMA cleanup performed in > > dw_pcie_ep_cleanup(), powering down MHI EPF etc...). So these cleanup > > functions are currently causing the crash in the endpoint SoC once host > > asserts PERST#. > > > > One way to address this issue is by generating the refclk in the endpoint > > itself and not depending on the host. But that is not always possible as > > some of the endpoint designs do require the endpoint to consume refclk from > > the host (as I was told by the Qcom engineers). > > > > So let's fix this crash by moving the controller cleanups to the start of > > the qcom_pcie_perst_deassert() function. qcom_pcie_perst_deassert() is > > called whenever the host has deasserted PERST# and it is guaranteed that > > the refclk would be active at this point. So at the start of this function, > > the controller cleanup can be performed. Once finished, rest of the code > > execution for PERST# deassert can continue as usual. > > What makes this v6.11 material? Does it fix a problem we added in > v6.11-rc1? > No, this is not a 6.11 material, but the rest of the patches I shared offline. > Is there a Fixes: commit? > Hmm, the controller addition commit could be the valid fixes tag. > This patch essentially does this: > > qcom_pcie_perst_assert > - pci_epc_deinit_notify > - dw_pcie_ep_cleanup > qcom_pcie_disable_resources > > qcom_pcie_perst_deassert > + if (pcie_ep->cleanup_pending) > + pci_epc_deinit_notify(pci->ep.epc); > + dw_pcie_ep_cleanup(&pci->ep); > dw_pcie_ep_init_registers > pci_epc_init_notify > > Maybe it makes sense to call both pci_epc_deinit_notify() and > pci_epc_init_notify() from the PERST# deassert function, but it makes > me question whether we really need both. > There is really no need to call pci_epc_deinit_notify() during the first deassert (i.e., during the ep boot) because there are no cleanups to be done. It is only needed during a successive PERST# assert + deassert. > pcie-tegra194.c has a similar structure: > > pex_ep_event_pex_rst_assert > pci_epc_deinit_notify > dw_pcie_ep_cleanup > > pex_ep_event_pex_rst_deassert > dw_pcie_ep_init_registers > pci_epc_init_notify > > Is there a reason to make them different, or could/should a similar > change be made to tegra? > Design wise both drivers are similar, so it could apply. I didn't spin a patch because if testing of tegra driver gets delayed (I've seen this before), then I do not want to stall merging the whole series. For Qcom it is important to get this merged asap to avoid the crash. > > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> > > --- > > drivers/pci/controller/dwc/pcie-qcom-ep.c | 12 ++++++++++-- > > 1 file changed, 10 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/pci/controller/dwc/pcie-qcom-ep.c b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > index 2319ff2ae9f6..e024b4dcd76d 100644 > > --- a/drivers/pci/controller/dwc/pcie-qcom-ep.c > > +++ b/drivers/pci/controller/dwc/pcie-qcom-ep.c > > @@ -186,6 +186,8 @@ struct qcom_pcie_ep_cfg { > > * @link_status: PCIe Link status > > * @global_irq: Qualcomm PCIe specific Global IRQ > > * @perst_irq: PERST# IRQ > > + * @cleanup_pending: Cleanup is pending for the controller (because refclk is > > + * needed for cleanup) > > */ > > struct qcom_pcie_ep { > > struct dw_pcie pci; > > @@ -214,6 +216,7 @@ struct qcom_pcie_ep { > > enum qcom_pcie_ep_link_status link_status; > > int global_irq; > > int perst_irq; > > + bool cleanup_pending; > > }; > > > > static int qcom_pcie_ep_core_reset(struct qcom_pcie_ep *pcie_ep) > > @@ -389,6 +392,12 @@ static int qcom_pcie_perst_deassert(struct dw_pcie *pci) > > return ret; > > } > > > > + if (pcie_ep->cleanup_pending) { > > Do we really need this flag? I assume the cleanup functions could > tell whether any previous setup was done? > Not so. Some cleanup functions may trigger a warning if attempted to do it before 'setup'. I think dw_edma_remove() that is part of dw_pcie_ep_cleanup() does that IIRC. - Mani -- மணிவண்ணன் சதாசிவம்