On Fri, Aug 23, 2024 at 10:11:33AM +0530, Manivannan Sadhasivam wrote: > On Thu, Aug 22, 2024 at 12:31:33PM -0500, Bjorn Helgaas wrote: > > On Thu, Aug 22, 2024 at 09:10:25PM +0530, Manivannan Sadhasivam wrote: > > > On Thu, Aug 22, 2024 at 10:16:58AM -0500, Bjorn Helgaas wrote: > > > > On Thu, Aug 22, 2024 at 12:18:23PM +0530, Manivannan Sadhasivam wrote: > > > > > On Wed, Aug 21, 2024 at 05:56:18PM -0500, Bjorn Helgaas wrote: > > > > > ... > > > > > > > > > > Although I do have the question of what happens if the RC deasserts > > > > > > PERST# before qcom-ep is loaded. We probably don't execute > > > > > > qcom_pcie_perst_deassert() in that case, so how does the init happen? > > > > > > > > > > PERST# is a level trigger signal. So even if the host has asserted > > > > > it before EP booted, the level will stay low and ep will detect it > > > > > while booting. > > > > > > > > The PERST# signal itself is definitely level oriented. > > > > > > > > I'm still skeptical about the *interrupt* from the PCIe controller > > > > being level-triggered, as I mentioned here: > > > > https://lore.kernel.org/r/20240815224735.GA57931@bhelgaas > > > > > > Sorry, that comment got buried into my inbox. So didn't get a chance > > > to respond. > > > > > > > tegra194 is also dwc-based and has a similar PERST# interrupt but > > > > it's edge-triggered (tegra_pcie_ep_pex_rst_irq()), which I think > > > > is a cleaner implementation. Then you don't have to remember the > > > > current state, switch between high and low trigger, worry about > > > > races and missing a pulse, etc. > > > > > > I did try to mimic what tegra194 did when I wrote the qcom-ep > > > driver, but it didn't work. If we use the level triggered interrupt > > > as edge, the interrupt will be missed if we do not listen at the > > > right time (when PERST# goes from high to low and vice versa). > > > > > > I don't know how tegra194 interrupt controller is wired up, but IIUC > > > they will need to boot the endpoint first and then host to catch the > > > PERST# interrupt. Otherwise, the endpoint will never see the > > > interrupt until host toggles it again. > > > > Having to control the boot ordering of endpoint and host is definitely > > problematic. > > > > What is the nature of the crash when we try to enable the PHY when > > Refclk is not available? The endpoint has no control over when the > > host asserts/deasserts PERST#. If PERST# happens to be asserted while > > the endpoint is enabling the PHY, and this causes some kind of crash > > that the endpoint driver can't easily recover from, that's a serious > > robustness problem. > > The whole endpoint SoC crashes if the refclk is not available during > phy_power_on() as the PHY driver tries to access some register on Dmitry's > platform (I did not see this crash on SM8450 SoC though). > > If we keep the enable_resources() during probe() then the race condition you > observed above could apply. So removing that from probe() will also make the > race condition go away, Example: 1) host deasserts PERST# 2) qcom-ep handles PERST# IRQ 3) qcom_pcie_ep_perst_irq_thread() calls qcom_pcie_perst_deassert() 4) host asserts PERST#, Refclk no longer valid 5) qcom_pcie_perst_deassert() calls qcom_pcie_enable_resources() 6) qcom_pcie_enable_resources() enables PHY I don't see what prevents the PERST# assertion at 4. It sounds like the endpoint SoC crashes at 6.