On Fri, Dec 11, 2020 at 6:15 AM Daniel Thompson <daniel.thompson@xxxxxxxxxx> wrote: > > I have been chasing down a problem enumerating an NVMe drive on a > Honeycomb LX2K (NXP LX2160A). Specifically the drive can only enumerate > successfully if the we are emitting lots of console messages via a UART. > If the system is booted with `quiet` set then enumeration fails. We really need to get away from work-arounds for device X on host Y. I suspect they are not limited to that combination only... How exactly does it fail to enumerate? There's a (racy) linkup check on config accesses, is it reporting link down and skipping config accesses? > I guessed this would be due to the timing impact of printk-to-UART and > tried to find out where a delay could be added to provoke a successful > enumeration. > > This patch contains the results. The delay length (1ms) was selected by > binary searching downwards until the delay was not effective for these > devices (Honeycomb LX2K and a Western Digital WD Blue SN550). > > I have also included the workaround twice (conditionally compiled). The > first change is the *latest* possible code path that we can deploy a > delay whilst the second is the earliest place I could find. > > The summary is that the critical window were we are currently relying on > a call to the console UART code can "mend" the driver runs from calling > dw_pcie_setup_rc() in host init to just before we read the state in the > link up callback. > > Signed-off-by: Daniel Thompson <daniel.thompson@xxxxxxxxxx> > --- > > Notes: > This patch is RFC (and HACK) because I don't have much clue *why* this > patch works... merely that this is the smallest possible change I need > to replicate the current accidental printk() workaround. Perhaps one > could argue that RFC here stands for request-for-clue. All my > observations and changes here are empirical and I don't know how best to > turn them into something that is not a hack! > > BTW I noticed many other pcie-designware drivers take advantage > of a function called dw_pcie_wait_for_link() in their init paths... > but my naive attempts to add it to the layerscape driver results > in non-booting systems so I haven't embarrassed myself by including > that in the patch! You need to look at what's pending for v5.11, because I reworked this to be more unified. The ordering of init is also possibly changed. The sequence is now like this: dw_pcie_setup_rc(pp); dw_pcie_msi_init(pp); if (!dw_pcie_link_up(pci) && pci->ops->start_link) { ret = pci->ops->start_link(pci); if (ret) goto err_free_msi; } /* Ignore errors, the link may come up later */ dw_pcie_wait_for_link(pci); > drivers/pci/controller/dwc/pci-layerscape.c | 35 +++++++++++++++++++++ > 1 file changed, 35 insertions(+) > > diff --git a/drivers/pci/controller/dwc/pci-layerscape.c b/drivers/pci/controller/dwc/pci-layerscape.c > index f24f79a70d9a..c354904b90ef 100644 > --- a/drivers/pci/controller/dwc/pci-layerscape.c > +++ b/drivers/pci/controller/dwc/pci-layerscape.c > @@ -22,6 +22,8 @@ > > #include "pcie-designware.h" > > +#define WORKAROUND_LATEST_POSSIBLE > + > /* PEX1/2 Misc Ports Status Register */ > #define SCFG_PEXMSCPORTSR(pex_idx) (0x94 + (pex_idx) * 4) > #define LTSSM_STATE_SHIFT 20 > @@ -113,10 +115,31 @@ static int ls_pcie_link_up(struct dw_pcie *pci) > struct ls_pcie *pcie = to_ls_pcie(pci); > u32 state; > > + /* > + * Strictly speaking *this* (before the ioread32) is the latest > + * point a simple delay can be effective. If we move the delay > + * after the ioread32 then the NVMe does not enumerate. > + * > + * However this function appears to be frequently called so an > + * unconditional delay here causes noticeable delay at boot > + * time. Hence we implement the workaround by retrying the read > + * after a short delay if we think we might need to return false. > + */ > + > state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> > pcie->drvdata->ltssm_shift) & > LTSSM_STATE_MASK; > > +#ifdef WORKAROUND_LATEST_POSSIBLE > + if (state < LTSSM_PCIE_L0) { > + /* see comment above */ > + mdelay(1); > + state = (ioread32(pcie->lut + pcie->drvdata->lut_dbg) >> > + pcie->drvdata->ltssm_shift) & > + LTSSM_STATE_MASK; Side note, I really wonder about the variety of ways in DWC drivers we have to check link state. The default is a debug register... Are the standard PCIe registers for this really broken in these cases? > + } > +#endif > + > if (state < LTSSM_PCIE_L0) > return 0; > > @@ -152,6 +175,18 @@ static int ls_pcie_host_init(struct pcie_port *pp) > > dw_pcie_setup_rc(pp); Might be related to the PORT_LOGIC_SPEED_CHANGE we do in here. > +#ifdef WORKAROUND_EARLIEST_POSSIBLE > + /* > + * This is the earliest point the delay is effective. > + * If we move it before dw_pcie_setup_rc() then the > + * NVMe does not enumerate. > + * > + * 500us is too short to reliably work around the issue > + * hence adopting 1000us here. > + */ > + mdelay(1); > +#endif > + > return 0; > } > > > base-commit: 0477e92881850d44910a7e94fc2c46f96faa131f > -- > 2.29.2 >