On Wed, Mar 05, 2025 at 11:38:26AM -0600, Bjorn Helgaas wrote: > On Tue, Mar 04, 2025 at 10:41:54PM +0530, Manivannan Sadhasivam wrote: > > On Wed, Mar 05, 2025 at 12:46:38AM +0900, Krzysztof Wilczyński wrote: > > > > On Mon, 3 Mar 2025 at 20:47, Krzysztof Wilczyński <kw@xxxxxxxxx> wrote: > > > > > [...] > > > > > > > +int dwc_pcie_debugfs_init(struct dw_pcie *pci) > > > > > > > +{ > > > > > > > + char dirname[DWC_DEBUGFS_BUF_MAX]; > > > > > > > + struct device *dev = pci->dev; > > > > > > > + struct debugfs_info *debugfs; > > > > > > > + struct dentry *dir; > > > > > > > + int ret; > > > > > > > + > > > > > > > + /* Create main directory for each platform driver */ > > > > > > > + snprintf(dirname, DWC_DEBUGFS_BUF_MAX, "dwc_pcie_%s", dev_name(dev)); > > > > > > > + dir = debugfs_create_dir(dirname, NULL); > > > > > > > + debugfs = devm_kzalloc(dev, sizeof(*debugfs), GFP_KERNEL); > > > > > > > + if (!debugfs) > > > > > > > + return -ENOMEM; > > > > > > > + > > > > > > > + debugfs->debug_dir = dir; > > > > > > > + pci->debugfs = debugfs; > > > > > > > + ret = dwc_pcie_rasdes_debugfs_init(pci, dir); > > > > > > > + if (ret) > > > > > > > + dev_dbg(dev, "RASDES debugfs init failed\n"); > > > > > > > > > > > > What will happen if ret != 0? still return 0? > > > > > > > > And that is exactly what happens on Gray Hawk Single with R-Car > > > > V4M: dw_pcie_find_rasdes_capability() returns NULL, causing > > > > dwc_pcie_rasdes_debugfs_init() to return -ENODEV. > > > > > > > > Debugfs issues should never be propagated upstream! > > ... > > > > > So while applying, you changed this like: > > > > > > > > ret = dwc_pcie_rasdes_debugfs_init(pci, dir); > > > > - if (ret) > > > > - dev_dbg(dev, "RASDES debugfs init failed\n"); > > > > + if (ret) { > > > > + dev_err(dev, "failed to initialize RAS DES debugfs\n"); > > > > + return ret; > > > > + } > > > > > > > > return 0; > > > > > > > > Hence this is now a fatal error, causing the probe to fail. > > > Even though debugfs_init() failure is not supposed to fail the probe(), > > dwc_pcie_rasdes_debugfs_init() has a devm_kzalloc() and propagating that > > failure would be canolically correct IMO. > > I'm not sure about this. What's the requirement to propagate > devm_kzalloc() failures? I think devres will free any allocs that > were successful regardless. > > IIUC, we resolved the Gray Hawk Single issue by changing > dwc_pcie_rasdes_debugfs_init() to return success without doing > anything when there's no RAS DES Capability. > > But dwc_pcie_debugfs_init() can still return failure, and that still > causes dw_pcie_ep_init_registers() to fail, which breaks the "don't > propagate debugfs issues upstream" rule: > > int dw_pcie_ep_init_registers(struct dw_pcie_ep *ep) > { > ... > ret = dwc_pcie_debugfs_init(pci); > if (ret) > goto err_remove_edma; > > return 0; > > err_remove_edma: > dw_pcie_edma_remove(pci); > > return ret; > } > > We can say that kzalloc() failure should "never" happen, and therefore > it's OK to fail the driver probe if it happens, but that doesn't seem > like a strong argument for breaking the "don't propagate debugfs > issues" rule. And someday there may be other kinds of failures from > dwc_pcie_debugfs_init(). > Fine with me. I was not too sure about propagating failure either. - Mani -- மணிவண்ணன் சதாசிவம்