On Sun, Aug 20, 2017 at 09:06:51AM +0530, Oza Oza wrote: > On Sun, Aug 20, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote: > >> PERST must be asserted around ~500ms before the reboot is applied. > >> > >> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs > >> LCPLL clock and PERST both goes off simultaneously. > >> This will cause certain Endpoints Intel NVMe not get detected, upon > >> next boot sequence. > >> > >> This is specifically happening with Intel P3700 400GB series. > >> Endpoint is expecting the clock for some amount of time after PERST is > >> asserted, which is not happening in Stingray (iproc based SOC). > >> This causes NVMe to behave in undefined way. > >> > >> On the contrary, Intel x86 boards will have ref clock running, so we > >> do not see this behavior there. > >> > >> Besides, PCI spec does not stipulate about such timings. > >> In-fact it does not tell us, whether PCIe device should consider > >> refclk to be available or not to be. > >> > >> It is completely up to vendor to design their EP the way they want > >> with respect to ref clock availability. > > > > I am unconvinced. Designing an endpoint that relies on ref clock > > timing not guaranteed by the spec does not sound like a way to build > > reliable hardware. > > > > The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock > > goes inactive after PERST# goes active, but doesn't specify how long. > > In the absence of a spec requirement, I don't see a reason to assume > > other systems preserve the ref clock after PERST#, so if the Intel > > device requires clocks for 500ms after PERST#, we should see problems > > on other systems. > > The reason you do not see and most likely you will not see on any > other system is > because, the ref clock is supplied by on board oscillator. > when PERST# is asserted, the ref clock is still available. > > but in our case, we do not have on-board clock generator, rather we > rely on the clock > coming from SOC, so if SOC reset is issued, both PERST# and the ref > clock goes off simultaneously. OK. I'm not a hardware person and will have to take your word for this. > please also refer to the link below which stipulate on clk being > active after PERST# being asserted. > http://www.eetimes.com/document.asp?doc_id=1279299 [Figure 2: > Power-down waveforms] This is just a copy of Figure 2-13 from the PCIe CEM spec I cited above. It's better to cite the spec itself than an article based on the spec. > however I am not saying that, this article has more to claim than PCIe spec. > but, I think the PCIe Card Electromechanical spec leaves the margin > for card manufactures to design the card based on the assumption > that ref clock could be available after PERST# is asserted. The only statement in the spec I'm aware of is that "Clock and JTAG go inactive after PERST# goes inactive." Since there's no required time the clock must remain active, a robust card would not assume the clock is available at all after PERST. 500ms is a *huge* length of time; I'd be very surprised if Intel designed a card that required that. I don't feel like we really got to the root cause of this, but this patch only hurts the iproc reboot time, so I'm OK with it. I was just hoping to avoid slowing down your reboot time. > most of the cards do not assume that, but at the least we have seen that, > once particular card which behaves as per the link which I have just > pasted above. > > > > > Sec 2.2 says that on power up, the power rails must be stable at least > > T(PVPERL) (100ms) and reference clocks must be stable at least > > T(PERST-CLK) (100us) before PERST# is deasserted. I think it's more > > likely the problem is here. > > > > The current iproc_pcie_reset() looks like it sleeps 100ms *after* it > > deasserts PERST#. Should that be *before* deasserting PERST#? > > > > T(PERST-CLK) (100us) before PERST# is deasserted. > which we are already waiting for 250us > > T(PVPERL) (100ms), the power rail is stable much before linux comes up. > so I still think we are meeting power up requirements. > > >> 500ms is just based on the observation and taken as safe margin. > >> This patch adds platform shutdown where it should be > >> called in device_shutdown while reboot command is issued. > >> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown) > >> followed by RC shutdown, which issues safe PERST assertion. > >> > >> Signed-off-by: Oza Pawandeep <oza.oza@xxxxxxxxxxxx> > >> Reviewed-by: Ray Jui <ray.jui@xxxxxxxxxxxx> > >> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx> > >> > >> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c > >> index 90d2bdd..9512960 100644 > >> --- a/drivers/pci/host/pcie-iproc-platform.c > >> +++ b/drivers/pci/host/pcie-iproc-platform.c > >> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev) > >> return iproc_pcie_remove(pcie); > >> } > >> > >> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev) > >> +{ > >> + struct iproc_pcie *pcie = platform_get_drvdata(pdev); > >> + > >> + iproc_pcie_shutdown(pcie); > >> +} > >> + > >> static struct platform_driver iproc_pcie_pltfm_driver = { > >> .driver = { > >> .name = "iproc-pcie", > >> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev) > >> }, > >> .probe = iproc_pcie_pltfm_probe, > >> .remove = iproc_pcie_pltfm_remove, > >> + .shutdown = iproc_pcie_pltfm_shutdown, > >> }; > >> module_platform_driver(iproc_pcie_pltfm_driver); > >> > >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c > >> index 583cee0..ee40651 100644 > >> --- a/drivers/pci/host/pcie-iproc.c > >> +++ b/drivers/pci/host/pcie-iproc.c > >> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn, > >> .write = iproc_pcie_config_write32, > >> }; > >> > >> -static void iproc_pcie_reset(struct iproc_pcie *pcie) > >> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert) > >> { > >> u32 val; > >> > >> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie) > >> if (pcie->ep_is_internal) > >> return; > >> > >> - /* > >> - * Select perst_b signal as reset source. Put the device into reset, > >> - * and then bring it out of reset > >> - */ > >> - val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL); > >> - val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST & > >> - ~RC_PCIE_RST_OUTPUT; > >> - iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val); > >> - udelay(250); > >> - > >> - val |= RC_PCIE_RST_OUTPUT; > >> - iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val); > >> - msleep(100); > >> + if (assert) { > >> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL); > >> + val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST & > >> + ~RC_PCIE_RST_OUTPUT; > >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val); > >> + udelay(250); > >> + } else { > >> + val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL); > >> + val |= RC_PCIE_RST_OUTPUT; > >> + iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val); > >> + msleep(100); > >> + } > >> +}