Re: [PATCH v6 2/2] PCI: iproc: add device shutdown for PCI RC

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Sun, 20 Aug 2017 16:25:43 -0500

On Sun, Aug 20, 2017 at 09:06:51AM +0530, Oza Oza wrote:
> On Sun, Aug 20, 2017 at 2:34 AM, Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Fri, Aug 04, 2017 at 09:18:16PM +0530, Oza Pawandeep wrote:
> >> PERST must be asserted around ~500ms before the reboot is applied.
> >>
> >> During soft reset (e.g., "reboot" from Linux) on some iproc based SOCs
> >> LCPLL clock and PERST both goes off simultaneously.
> >> This will cause certain Endpoints Intel NVMe not get detected, upon
> >> next boot sequence.
> >>
> >> This is specifically happening with Intel P3700 400GB series.
> >> Endpoint is expecting the clock for some amount of time after PERST is
> >> asserted, which is not happening in Stingray (iproc based SOC).
> >> This causes NVMe to behave in undefined way.
> >>
> >> On the contrary, Intel x86 boards will have ref clock running, so we
> >> do not see this behavior there.
> >>
> >> Besides, PCI spec does not stipulate about such timings.
> >> In-fact it does not tell us, whether PCIe device should consider
> >> refclk to be available or not to be.
> >>
> >> It is completely up to vendor to design their EP the way they want
> >> with respect to ref clock availability.
> >
> > I am unconvinced.  Designing an endpoint that relies on ref clock
> > timing not guaranteed by the spec does not sound like a way to build
> > reliable hardware.
> >
> > The PCIe Card Electromechanical spec, r2.0, sec 2.2.3 says the clock
> > goes inactive after PERST# goes active, but doesn't specify how long.
> > In the absence of a spec requirement, I don't see a reason to assume
> > other systems preserve the ref clock after PERST#, so if the Intel
> > device requires clocks for 500ms after PERST#, we should see problems
> > on other systems.
> 
> The reason you do not see and most likely you will not see on any
> other system is
> because, the ref clock is supplied by on board oscillator.
> when PERST# is asserted, the ref clock is still available.
> 
> but in our case, we do not have on-board clock generator, rather we
> rely on the clock
> coming from SOC, so if SOC reset is issued, both PERST# and the ref
> clock goes off simultaneously.

OK.  I'm not a hardware person and will have to take your word for
this.

> please also refer to the link below which stipulate on clk being
> active after PERST# being asserted.
> http://www.eetimes.com/document.asp?doc_id=1279299  [Figure 2:
> Power-down waveforms]

This is just a copy of Figure 2-13 from the PCIe CEM spec I cited
above.  It's better to cite the spec itself than an article based on
the spec.

> however I am not saying that, this article has more to claim than PCIe spec.
> but, I think the PCIe Card Electromechanical spec leaves the margin
> for card manufactures to design the card based on the assumption
> that ref clock could be available after PERST#  is asserted.

The only statement in the spec I'm aware of is that "Clock and JTAG go
inactive after PERST# goes inactive."  Since there's no required time
the clock must remain active, a robust card would not assume the clock
is available at all after PERST.  500ms is a *huge* length of time;
I'd be very surprised if Intel designed a card that required that.

I don't feel like we really got to the root cause of this, but this
patch only hurts the iproc reboot time, so I'm OK with it.  I was just
hoping to avoid slowing down your reboot time.

> most of the cards do not assume that, but at the least we have seen that,
> once particular card which behaves as per the link which I have just
> pasted above.
> 
> >
> > Sec 2.2 says that on power up, the power rails must be stable at least
> > T(PVPERL) (100ms) and reference clocks must be stable at least
> > T(PERST-CLK) (100us) before PERST# is deasserted.  I think it's more
> > likely the problem is here.
> >
> > The current iproc_pcie_reset() looks like it sleeps 100ms *after* it
> > deasserts PERST#.  Should that be *before* deasserting PERST#?
> >
> 
> T(PERST-CLK) (100us) before PERST# is deasserted.
> which we are already waiting for 250us
> 
> T(PVPERL) (100ms), the power rail is stable much before linux comes up.
> so I still think we are meeting power up requirements.
> 
> >> 500ms is just based on the observation and taken as safe margin.
> >> This patch adds platform shutdown where it should be
> >> called in device_shutdown while reboot command is issued.
> >> So in sequence first Endpoint Shutdown (e.g. nvme_shutdown)
> >> followed by RC shutdown, which issues safe PERST assertion.
> >>
> >> Signed-off-by: Oza Pawandeep <oza.oza@xxxxxxxxxxxx>
> >> Reviewed-by: Ray Jui <ray.jui@xxxxxxxxxxxx>
> >> Reviewed-by: Scott Branden <scott.branden@xxxxxxxxxxxx>
> >>
> >> diff --git a/drivers/pci/host/pcie-iproc-platform.c b/drivers/pci/host/pcie-iproc-platform.c
> >> index 90d2bdd..9512960 100644
> >> --- a/drivers/pci/host/pcie-iproc-platform.c
> >> +++ b/drivers/pci/host/pcie-iproc-platform.c
> >> @@ -131,6 +131,13 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
> >>       return iproc_pcie_remove(pcie);
> >>  }
> >>
> >> +static void iproc_pcie_pltfm_shutdown(struct platform_device *pdev)
> >> +{
> >> +     struct iproc_pcie *pcie = platform_get_drvdata(pdev);
> >> +
> >> +     iproc_pcie_shutdown(pcie);
> >> +}
> >> +
> >>  static struct platform_driver iproc_pcie_pltfm_driver = {
> >>       .driver = {
> >>               .name = "iproc-pcie",
> >> @@ -138,6 +145,7 @@ static int iproc_pcie_pltfm_remove(struct platform_device *pdev)
> >>       },
> >>       .probe = iproc_pcie_pltfm_probe,
> >>       .remove = iproc_pcie_pltfm_remove,
> >> +     .shutdown = iproc_pcie_pltfm_shutdown,
> >>  };
> >>  module_platform_driver(iproc_pcie_pltfm_driver);
> >>
> >> diff --git a/drivers/pci/host/pcie-iproc.c b/drivers/pci/host/pcie-iproc.c
> >> index 583cee0..ee40651 100644
> >> --- a/drivers/pci/host/pcie-iproc.c
> >> +++ b/drivers/pci/host/pcie-iproc.c
> >> @@ -627,7 +627,7 @@ static int iproc_pcie_config_write32(struct pci_bus *bus, unsigned int devfn,
> >>       .write = iproc_pcie_config_write32,
> >>  };
> >>
> >> -static void iproc_pcie_reset(struct iproc_pcie *pcie)
> >> +static void iproc_pcie_perst_ctrl(struct iproc_pcie *pcie, bool assert)
> >>  {
> >>       u32 val;
> >>
> >> @@ -639,20 +639,28 @@ static void iproc_pcie_reset(struct iproc_pcie *pcie)
> >>       if (pcie->ep_is_internal)
> >>               return;
> >>
> >> -     /*
> >> -      * Select perst_b signal as reset source. Put the device into reset,
> >> -      * and then bring it out of reset
> >> -      */
> >> -     val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> -     val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> >> -             ~RC_PCIE_RST_OUTPUT;
> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> -     udelay(250);
> >> -
> >> -     val |= RC_PCIE_RST_OUTPUT;
> >> -     iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> -     msleep(100);
> >> +     if (assert) {
> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> +             val &= ~EP_PERST_SOURCE_SELECT & ~EP_MODE_SURVIVE_PERST &
> >> +                     ~RC_PCIE_RST_OUTPUT;
> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> +             udelay(250);
> >> +     } else {
> >> +             val = iproc_pcie_read_reg(pcie, IPROC_PCIE_CLK_CTRL);
> >> +             val |= RC_PCIE_RST_OUTPUT;
> >> +             iproc_pcie_write_reg(pcie, IPROC_PCIE_CLK_CTRL, val);
> >> +             msleep(100);
> >> +     }
> >> +}