[+cc Xiaowei (author of 6bfc9c3a2c70), Hou (author of 8bcca2658558)] On Thu, Jul 28, 2022 at 05:56:28PM +0530, Vidya Sagar wrote: > On 7/28/2022 3:44 AM, Bjorn Helgaas wrote: > > On Wed, Jun 22, 2022 at 09:31:33AM +0530, Vidya Sagar wrote: > > > Platforms that cannot support their core initialization without the > > > reference clock from the host, implement the feature 'core_init_notifier' > > > to indicate the DesignWare sub-system about when their core is getting > > > initialized. Any accesses to the core (Ex:- DBI) would result in system > > > hang in such systems (Ex:- tegra194). This patch moves any access to the > > > core to dw_pcie_ep_init_complete() API which is effectively called only > > > after the core initialization. > > > > I assume this is still broken. I want to fix it. I assume this patch > > fixes it and there are no known problems with it. I assume this can > > be fixed so it works on all platforms, whether they use > > core_init_notifier or not. > > Yes. All your assumptions are correct. > > > I'd like the commit log to be specific about where the hang occurs so > > it's easy for a non-DesignWare expert (me!) to see the problem. E.g., > > on tegra194, X depends on Y, but Y is initialized after X. Say > > specifically what functions X and Y are. > > X = DBI accesses > Y = Core initialization which in turn depends on the REFCLK from the host > > Without this patch, hang happens when DBI registers are accessed without > core being initialized. In the case of Tegra194 at least, core gets > initialized only after REFCLK is available from the host. The way we make > sure that the REFCLK is available from the host is by checking for PERST# > de-assertion interrupt. (PCIe spec mandates that the host must supply REFCLK > before de-asserting PERST# signal). > This patch prevents any accesses to the DBI/Core registers if the platform > says that it supports core_init_notifier. There are several commits involved in this, oldest to newest: e966f7390da9 ("PCI: dwc: Refactor core initialization code for EP mode") Split EP initialization into dw_pcie_ep_init() (which doesn't touch core registers) and dw_pcie_ep_init_complete() (which does). dw_pcie_ep_init of_property_read_u32(np, "num-ib-windows", &ep->num_ib_windows) ep->ib_window_map = devm_kcalloc(..., ep->num_ib_windows, ...) dw_pcie_ep_init_complete # core regs are available dw_pcie_readb_dbi # <-- read core regs 6bfc9c3a2c70 ("PCI: designware-ep: Move the function of getting MSI capability forward") Move MSI capability lookup from dw_pcie_ep_init_complete() to dw_pcie_ep_init(). dw_pcie_ep_init dw_pcie_find_capability(pci, PCI_CAP_ID_MSI) dw_pcie_readw_dbi # <-- read core regs dw_pcie_ep_init_complete # core regs are available This is broken because we access the DBI registers before the core is initialized. 281f1f99cf3a ("PCI: dwc: Detect number of iATU windows") Read num_ib_windows from iATU registers instead of from DT property. dw_pcie_ep_init ib_window_map = devm_kcalloc(..., pci->num_ib_windows, ...) # use dw_pcie_ep_init_complete dw_pcie_setup dw_pcie_iatu_detect_regions dw_pcie_readl_dbi pci->num_ib_windows = ib # init This ordering is broken because we use pci->num_ib_windows before we initialize it. 8bcca2658558 ("PCI: dwc: Move iATU detection earlier") Fix the use-before-init problem by moving init earlier: dw_pcie_ep_init dw_pcie_iatu_detect dw_pcie_readl_dbi # <-- read core regs pci->num_ib_windows = ib # init ib_window_map = devm_kcalloc(..., pci->num_ib_windows, ...) # use dw_pcie_ep_init_complete # core regs are available dw_pcie_setup This fixed the use-before-init, but it causes the hang by accessing the DBI registers before the core is initialized. This patch addresses the problems of 6bfc9c3a2c70 and 8bcca2658558 by moving dw_pcie_iatu_detect() and dw_pcie_ep_find_capability() from dw_pcie_ep_init() to dw_pcie_ep_init_complete(). We probably need Fixes: tags for both. But I don't think this patch is enough, and I have a lot of questions: 1) The "core_init_notifier" mechanism is kind of obtuse. IIUC, any driver that sets ".core_init_notifier = true" is responsible for calling dw_pcie_ep_init_complete() at some point. But that's not explicit at all in the code. There's no "notifier" involved. I suppose the idea is that the *interrupt* is the notifier, but the naming doesn't help connect things. 2) I still want to know exactly where (what function) the critical transition that enables DBI access is. I guess it's qcom_pcie_perst_deassert() for qcom and pex_ep_event_pex_rst_deassert() for tegra? And for drivers that don't set .core_init_notifier, DBI is accessible any time in dw_pcie_ep_init() or later? 3) How do these interrupt handlers work? I'm used to the model of: Install interrupt handler Do something to the device so it will generate an interrupt Later, the device generates the interrupt Execute the interrupt handler We register tegra_pcie_ep_pex_rst_irq() as an interrupt handler, but what touches the device and starts the process that causes the interrupt? 4) 6bfc9c3a2c70 moved the MSI capability lookup to dw_pcie_ep_init() because .ep_init() functions, e.g., ls_pcie_ep_init(), depend on ep_func->msi_cap. Moving the MSI lookup to dw_pcie_ep_init_complete() will break that again, won't it? 5) dw_pcie_ep_init() calls ep->ops->ep_init(ep), and almost all of those .ep_init() methods, e.g., dra7xx_pcie_ep_init(), ls_pcie_ep_init(), etc., call dw_pcie_ep_reset_bar(), which does DBI writes. These are before dw_pcie_ep_init_complete(), so they are broken per the design (though they probably *work* because the drivers don't set .core_init_notifier, so DBI accesses probably work earlier). The only .ep_init() in a driver that sets .core_init_notifier is qcom_pcie_ep_init(). That *looks* like it should actually be broken because it runs before dw_pcie_ep_init_complete(). 6) What's going on with the CORE_INIT and LINK_UP notifiers? dw_pcie_ep_init_notify() is only called by qcom and tegra. dw_pcie_ep_linkup() is only called by dra7xx, qcom, and tegra. As far as I can tell, nobody at all registers to handle those events except a test. I think it's pointless to have that code if nobody uses it. > > > --- > > > .../pci/controller/dwc/pcie-designware-ep.c | 88 +++++++++++-------- > > > 1 file changed, 49 insertions(+), 39 deletions(-) > > > > > > diff --git a/drivers/pci/controller/dwc/pcie-designware-ep.c b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > index 0eda8236c125..9feec720175f 100644 > > > --- a/drivers/pci/controller/dwc/pcie-designware-ep.c > > > +++ b/drivers/pci/controller/dwc/pcie-designware-ep.c > > > @@ -639,9 +639,14 @@ static unsigned int dw_pcie_ep_find_ext_capability(struct dw_pcie *pci, int cap) > > > int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep) > > > { > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > + struct dw_pcie_ep_func *ep_func; > > > + struct device *dev = pci->dev; > > > + struct pci_epc *epc = ep->epc; > > > unsigned int offset; > > > unsigned int nbars; > > > u8 hdr_type; > > > + u8 func_no; > > > + void *addr; > > > u32 reg; > > > int i; > > > > > > @@ -654,6 +659,42 @@ int dw_pcie_ep_init_complete(struct dw_pcie_ep *ep) > > > return -EIO; > > > } > > > > > > + dw_pcie_iatu_detect(pci); > > > + > > > + ep->ib_window_map = devm_kcalloc(dev, > > > + BITS_TO_LONGS(pci->num_ib_windows), > > > + sizeof(long), > > > + GFP_KERNEL); > > > + if (!ep->ib_window_map) > > > + return -ENOMEM; > > > + > > > + ep->ob_window_map = devm_kcalloc(dev, > > > + BITS_TO_LONGS(pci->num_ob_windows), > > > + sizeof(long), > > > + GFP_KERNEL); > > > + if (!ep->ob_window_map) > > > + return -ENOMEM; > > > + > > > + addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t), > > > + GFP_KERNEL); > > > + if (!addr) > > > + return -ENOMEM; > > > + ep->outbound_addr = addr; > > > + > > > + for (func_no = 0; func_no < epc->max_functions; func_no++) { > > > + ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL); > > > + if (!ep_func) > > > + return -ENOMEM; > > > + > > > + ep_func->func_no = func_no; > > > + ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no, > > > + PCI_CAP_ID_MSI); > > > + ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no, > > > + PCI_CAP_ID_MSIX); > > > + > > > + list_add_tail(&ep_func->list, &ep->func_list); > > > + } > > > + > > > offset = dw_pcie_ep_find_ext_capability(pci, PCI_EXT_CAP_ID_REBAR); > > > > > > dw_pcie_dbi_ro_wr_en(pci); > > > @@ -677,8 +718,6 @@ EXPORT_SYMBOL_GPL(dw_pcie_ep_init_complete); > > > int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > { > > > int ret; > > > - void *addr; > > > - u8 func_no; > > > struct resource *res; > > > struct pci_epc *epc; > > > struct dw_pcie *pci = to_dw_pcie_from_ep(ep); > > > @@ -686,7 +725,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > struct platform_device *pdev = to_platform_device(dev); > > > struct device_node *np = dev->of_node; > > > const struct pci_epc_features *epc_features; > > > - struct dw_pcie_ep_func *ep_func; > > > > > > INIT_LIST_HEAD(&ep->func_list); > > > > > > @@ -708,8 +746,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > } > > > } > > > > > > - dw_pcie_iatu_detect(pci); > > > - > > > res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "addr_space"); > > > if (!res) > > > return -EINVAL; > > > @@ -717,26 +753,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > ep->phys_base = res->start; > > > ep->addr_size = resource_size(res); > > > > > > - ep->ib_window_map = devm_kcalloc(dev, > > > - BITS_TO_LONGS(pci->num_ib_windows), > > > - sizeof(long), > > > - GFP_KERNEL); > > > - if (!ep->ib_window_map) > > > - return -ENOMEM; > > > - > > > - ep->ob_window_map = devm_kcalloc(dev, > > > - BITS_TO_LONGS(pci->num_ob_windows), > > > - sizeof(long), > > > - GFP_KERNEL); > > > - if (!ep->ob_window_map) > > > - return -ENOMEM; > > > - > > > - addr = devm_kcalloc(dev, pci->num_ob_windows, sizeof(phys_addr_t), > > > - GFP_KERNEL); > > > - if (!addr) > > > - return -ENOMEM; > > > - ep->outbound_addr = addr; > > > - > > > if (pci->link_gen < 1) > > > pci->link_gen = of_pci_get_max_link_speed(np); > > > > > > @@ -753,20 +769,6 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > if (ret < 0) > > > epc->max_functions = 1; > > > > > > - for (func_no = 0; func_no < epc->max_functions; func_no++) { > > > - ep_func = devm_kzalloc(dev, sizeof(*ep_func), GFP_KERNEL); > > > - if (!ep_func) > > > - return -ENOMEM; > > > - > > > - ep_func->func_no = func_no; > > > - ep_func->msi_cap = dw_pcie_ep_find_capability(ep, func_no, > > > - PCI_CAP_ID_MSI); > > > - ep_func->msix_cap = dw_pcie_ep_find_capability(ep, func_no, > > > - PCI_CAP_ID_MSIX); > > > - > > > - list_add_tail(&ep_func->list, &ep->func_list); > > > - } > > > - > > > if (ep->ops->ep_init) > > > ep->ops->ep_init(ep); > > > > > > @@ -790,6 +792,14 @@ int dw_pcie_ep_init(struct dw_pcie_ep *ep) > > > return 0; > > > } > > > > > > + /* > > > + * NOTE:- Avoid accessing the hardware (Ex:- DBI space) before this > > > + * step as platforms that implement 'core_init_notifier' feature may > > > + * not have the hardware ready (i.e. core initialized) for access > > > + * (Ex: tegra194). Any hardware access on such platforms result > > > + * in system hard hang. > > > + */ > > > + > > > return dw_pcie_ep_init_complete(ep); > > > } > > > EXPORT_SYMBOL_GPL(dw_pcie_ep_init); > > > -- > > > 2.17.1 > > >