Re: [PATCH v7 3/5] Add debugfs based silicon debug support in DWC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Krzysztof,

(CC corrected)

This patch is now commit 1ff54f4cbaed9ec6 ("PCI: dwc: Add debugfs
based Silicon Debug support for DWC") in pci/next (next-20250304).

On Mon, 3 Mar 2025 at 20:47, Krzysztof Wilczyński <kw@xxxxxxxxx> wrote:
> [...]
> > > +int dwc_pcie_debugfs_init(struct dw_pcie *pci)
> > > +{
> > > +   char dirname[DWC_DEBUGFS_BUF_MAX];
> > > +   struct device *dev = pci->dev;
> > > +   struct debugfs_info *debugfs;
> > > +   struct dentry *dir;
> > > +   int ret;
> > > +
> > > +   /* Create main directory for each platform driver */
> > > +   snprintf(dirname, DWC_DEBUGFS_BUF_MAX, "dwc_pcie_%s", dev_name(dev));
> > > +   dir = debugfs_create_dir(dirname, NULL);
> > > +   debugfs = devm_kzalloc(dev, sizeof(*debugfs), GFP_KERNEL);
> > > +   if (!debugfs)
> > > +           return -ENOMEM;
> > > +
> > > +   debugfs->debug_dir = dir;
> > > +   pci->debugfs = debugfs;
> > > +   ret = dwc_pcie_rasdes_debugfs_init(pci, dir);
> > > +   if (ret)
> > > +           dev_dbg(dev, "RASDES debugfs init failed\n");
> >
> > What will happen if ret != 0? still return 0?

And that is exactly what happens on Gray Hawk Single with R-Car
V4M: dw_pcie_find_rasdes_capability() returns NULL, causing
dwc_pcie_rasdes_debugfs_init() to return -ENODEV.

> Given that callers of dwc_pcie_debugfs_init() check for errors,

Debugfs issues should never be propagated upstream!

> this probably should correctly bubble up any failure coming from
> dwc_pcie_rasdes_debugfs_init().
>
> I made updates to the code directly on the current branch, have a look:

So while applying, you changed this like:

            ret = dwc_pcie_rasdes_debugfs_init(pci, dir);
    -       if (ret)
    -               dev_dbg(dev, "RASDES debugfs init failed\n");
    +       if (ret) {
    +               dev_err(dev, "failed to initialize RAS DES debugfs\n");
    +               return ret;
    +       }

            return 0;

Hence this is now a fatal error, causing the probe to fail.
Unfortunately something fails during cleanup:

    pcie-rcar-gen4 e65d0000.pcie: failed to initialize RAS DES debugfs
    ------------[ cut here ]------------
    WARNING: CPU: 3 PID: 36 at kernel/irq/irqdomain.c:393
irq_domain_remove+0xa8/0xb0
    CPU: 3 UID: 0 PID: 36 Comm: kworker/u16:1 Not tainted
6.14.0-rc1-arm64-renesas-00134-g12c8c1363538 #2884
    Hardware name: Renesas Gray Hawk Single board based on r8a779h0 (DT)
    Workqueue: async async_run_entry_fn
    pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : irq_domain_remove+0xa8/0xb0
    lr : irq_domain_remove+0x2c/0xb0
    sp : ffff8000819b3b10
    x29: ffff8000819b3b10 x28: 0000000000000000 x27: 0000000000000000
    x26: ffff00044011d800 x25: ffff80008053294c x24: ffff000441740400
    x23: ffff0004413a30f0 x22: ffff0004413a3130 x21: ffff0004413a3130
    x20: ffff8000815c0ec8 x19: ffff0004415f8240 x18: 00000000ffffffff
    x17: 6775626564205345 x16: 0000000000000020 x15: ffff8000819b37b0
    x14: 0000000000000004 x13: ffff8000813e9dd8 x12: 0000000000000000
    x11: ffff0004404b6448 x10: ffff800080e85400 x9 : 1fffe00088334301
    x8 : 0000000000000001 x7 : ffff0004419a1800 x6 : ffff0004419a1808
    x5 : ffff000441349030 x4 : fffffffffffffdc1 x3 : 0000000000000000
    x2 : ffff0004403e0000 x1 : 0000000000000000 x0 : ffff00044134f630
    Call trace:
     irq_domain_remove+0xa8/0xb0 (P)
     dw_pcie_host_init+0x394/0x710
     rcar_gen4_pcie_probe+0x104/0x2f8
     platform_probe+0x64/0xbc
     really_probe+0xb8/0x294
     __driver_probe_device+0x74/0x124
     driver_probe_device+0x3c/0x158
     __device_attach_driver+0xd4/0x154
     bus_for_each_drv+0x84/0xe0
     __device_attach_async_helper+0xac/0xd0
     async_run_entry_fn+0x30/0xd8
     process_one_work+0x144/0x280
     worker_thread+0x2c4/0x3cc
     kthread+0x128/0x1e0
     ret_from_fork+0x10/0x20
    ---[ end trace 0000000000000000 ]---

Worse, the PCI bus is still registered, so running "lspci" causes an Oops:

    Unable to handle kernel NULL pointer dereference at virtual
address 0000000000000004
    Mem abort info:
      ESR = 0x0000000096000004
      EC = 0x25: DABT (current EL), IL = 32 bits
      SET = 0, FnV = 0
      EA = 0, S1PTW = 0
      FSC = 0x04: level 0 translation fault
    Data abort info:
      ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
      CM = 0, WnR = 0, TnD = 0, TagAccess = 0
      GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
    user pgtable: 4k pages, 48-bit VAs, pgdp=0000000483b53000
    [0000000000000004] pgd=0000000000000000, p4d=0000000000000000
    Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
    CPU: 3 UID: 0 PID: 707 Comm: lspci Tainted: G
W6.14.0-rc1-arm64-renesas-00134-g12c8c1363538 #2884
    Tainted: [W]=WARN
    Hardware name: Renesas Gray Hawk Single board based on r8a779h0 (DT)
    pstate: 204000c5 (nzCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    pc : pci_generic_config_read+0x34/0xac
    lr : pci_generic_config_read+0x20/0xac
    sp : ffff8000825cbbf0
    x29: ffff8000825cbbf0 x28: ffff0004491c4b84 x27: 0000000000000004
    x26: 000000000000000f x25: ffff0004491c4b80 x24: 0000000000000040
    x23: 0000000000000040 x22: ffff8000825cbc64 x21: ffff8000816fb4f8
    x20: ffff8000825cbc14 x19: 0000000000000004 x18: 0000000000000000
    x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
    x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
    x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
    x8 : 0000000000000000 x7 : ffff000442c653c0 x6 : ffff8000805163d0
    x5 : ffff8000804f3334 x4 : ffff8000825cbc14 x3 : ffff800080531990
    x2 : 0000000000000004 x1 : 0000000000000000 x0 : 0000000000000004
    Call trace:
     pci_generic_config_read+0x34/0xac (P)
     pci_user_read_config_dword+0x78/0x118
     pci_read_config+0xe4/0x29c
     sysfs_kf_bin_read+0x8c/0x9c
     kernfs_fop_read_iter+0x9c/0x19c
     vfs_read+0x24c/0x330
     __arm64_sys_pread64+0xac/0xc8
     invoke_syscall+0x44/0x100
     el0_svc_common.constprop.0+0x3c/0xd4
     do_el0_svc+0x18/0x20
     el0_svc+0x24/0xa8
     el0t_64_sync_handler+0x104/0x130
     el0t_64_sync+0x154/0x158
    Code: 7100067f 540002a0 71000a7f 54000160 (b9400000)
    ---[ end trace 0000000000000000 ]---
    note: lspci[707] exited with irqs disabled
    note: lspci[707] exited with preempt_count 1

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux