Hi On 10.10.2024 00:20, Mayank Rana wrote: > Hi > > It seems that PCIe controller (pcie-starfive.c driver, > starfive_pcie_probe()) is setting (child device) PCIe host bridge > device's runtime state as active going through plda_pcie_host_init() > -> pci_host_probe() before parent i.e. PCIe controller device itself > is being mark as active. > > log is showing below error from pm_runtime_enable() context: > dev_warn(dev, "Enabling runtime PM for inactive device with active > children\n"); > > Is it possible to try below change to see if it helps ? > ====== > diff --git a/drivers/pci/controller/plda/pcie-starfive.c > b/drivers/pci/controller/plda/pcie-starfive.c > index 0567ec373a3e..10bcd7e2e958 100644 > --- a/drivers/pci/controller/plda/pcie-starfive.c > +++ b/drivers/pci/controller/plda/pcie-starfive.c > @@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct > platform_device *pdev) > if (ret) > return ret; > > + pm_runtime_enable(&pdev->dev); > + pm_runtime_get_sync(&pdev->dev); > + > plda->host_ops = &sf_host_ops; > plda->num_events = PLDA_MAX_EVENT_NUM; > /* mask doorbell event */ > @@ -416,8 +419,6 @@ static int starfive_pcie_probe(struct > platform_device *pdev) > if (ret) > return ret; > > - pm_runtime_enable(&pdev->dev); > - pm_runtime_get_sync(&pdev->dev); > platform_set_drvdata(pdev, pcie); > > return 0; > Yes, this fixes both issues (runtime pm warning and lockdep splat). Thanks! Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx> > Regards, > Mayank > > On 10/9/2024 11:10 AM, Marek Szyprowski wrote: >> Hi Krishna, >> >> On 03.10.2024 08:02, Krishna chaitanya chundru wrote: >>> The Controller driver is the parent device of the PCIe host bridge, >>> PCI-PCI bridge and PCIe endpoint as shown below. >>> >>> PCIe controller(Top level parent & parent of host bridge) >>> | >>> v >>> PCIe Host bridge(Parent of PCI-PCI bridge) >>> | >>> v >>> PCI-PCI bridge(Parent of endpoint driver) >>> | >>> v >>> PCIe endpoint driver >>> >>> Now, when the controller device goes to runtime suspend, PM framework >>> will check the runtime PM state of the child device (host bridge) and >>> will find it to be disabled. So it will allow the parent (controller >>> device) to go to runtime suspend. Only if the child device's state was >>> 'active' it will prevent the parent to get suspended. >>> >>> It is a property of the runtime PM framework that it can only >>> follow continuous dependency chains. That is, if there is a device >>> with runtime PM disabled in a dependency chain, runtime PM cannot be >>> enabled for devices below it and above it in that chain both at the >>> same time. >>> >>> Since runtime PM is disabled for host bridge, the state of the child >>> devices under the host bridge is not taken into account by PM framework >>> for the top level parent, PCIe controller. So PM framework, allows >>> the controller driver to enter runtime PM irrespective of the state >>> of the devices under the host bridge. And this causes the topology >>> breakage and also possible PM issues like controller driver goes to >>> runtime suspend while endpoint driver is doing some transfers. >>> >>> Because of the above, in order to enable runtime PM for a PCIe >>> controller device, one needs to ensure that runtime PM is enabled for >>> all devices in every dependency chain between it and any PCIe endpoint >>> (as runtime PM is enabled for PCIe endpoints). >>> >>> This means that runtime PM needs to be enabled for the host bridge >>> device, which is present in all of these dependency chains. >>> >>> After this change, the host bridge device will be runtime-suspended >>> by the runtime PM framework automatically after suspending its last >>> child and it will be runtime-resumed automatically before resuming its >>> first child which will allow the runtime PM framework to track >>> dependencies between the host bridge device and all of its >>> descendants. >>> >>> Signed-off-by: Krishna chaitanya chundru <quic_krichai@xxxxxxxxxxx> >>> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> >> >> This patch landed in today's linux-next as commit 02787a3b4d10 ("PCI/PM: >> Enable runtime power management for host bridges"). In my tests I found >> that it triggers a warning on StarFive VisionFive2 RISC-V board. It >> looks that some more changes are needed in the dwc-pci driver or so. >> There is a message from runtime pm subsystem about inactive device with >> active children and suspicious locking pattern. Here is the log I >> observed on that board: >> >> ---->8--- >> >> pcie-starfive 940000000.pcie: port link up >> pcie-starfive 940000000.pcie: PCI host bridge to bus 0000:00 >> pci_bus 0000:00: root bus resource [bus 00-ff] >> pci_bus 0000:00: root bus resource [mem 0x30000000-0x37ffffff] >> pci_bus 0000:00: root bus resource [mem 0x900000000-0x93fffffff pref] >> pci 0000:00:00.0: [1556:1111] type 01 class 0x060400 PCIe Root Port >> pci 0000:00:00.0: PCI bridge to [bus 00] >> pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff] >> pci 0000:00:00.0: bridge window [mem 0x00000000-0x000fffff 64bit pref] >> pci 0000:00:00.0: supports D1 D2 >> pci 0000:00:00.0: PME# supported from D0 D1 D2 D3hot D3cold >> pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), >> reconfiguring >> pci 0000:01:00.0: [1106:3483] type 00 class 0x0c0330 PCIe Endpoint >> pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00000fff 64bit] >> pci 0000:01:00.0: PME# supported from D0 D3cold >> pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01 >> pci 0000:00:00.0: bridge window [mem 0x30000000-0x300fffff]: assigned >> pci 0000:01:00.0: BAR 0 [mem 0x30000000-0x30000fff 64bit]: assigned >> pci 0000:00:00.0: PCI bridge to [bus 01] >> pci 0000:00:00.0: bridge window [mem 0x30000000-0x300fffff] >> pci_bus 0000:00: resource 4 [mem 0x30000000-0x37ffffff] >> pci_bus 0000:00: resource 5 [mem 0x900000000-0x93fffffff pref] >> pci_bus 0000:01: resource 1 [mem 0x30000000-0x300fffff] >> pcieport 0000:00:00.0: enabling device (0000 -> 0002) >> pcieport 0000:00:00.0: PME: Signaling with IRQ 53 >> pci 0000:01:00.0: enabling device (0000 -> 0002) >> xhci_hcd 0000:01:00.0: xHCI Host Controller >> xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 >> xhci_hcd 0000:01:00.0: hcc params 0x002841eb hci version 0x100 quirks >> 0x0000000000000890 >> xhci_hcd 0000:01:00.0: xHCI Host Controller >> xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 >> xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed >> hub 1-0:1.0: USB hub found >> hub 1-0:1.0: 1 port detected >> hub 2-0:1.0: USB hub found >> hub 2-0:1.0: 4 ports detected >> pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device >> with active children >> >> ====================================================== >> WARNING: possible circular locking dependency detected >> 6.12.0-rc1+ #15438 Not tainted >> ------------------------------------------------------ >> systemd-udevd/159 is trying to acquire lock: >> ffffffff81822520 (console_owner){-.-.}-{0:0}, at: >> console_lock_spinning_enable+0x3a/0x60 >> >> but task is already holding lock: >> ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at: >> pm_runtime_enable+0x1e/0xb6 >> >> which lock already depends on the new lock. >> >> >> the existing dependency chain (in reverse order) is: >> >> -> #2 (&dev->power.lock){-...}-{2:2}: >> lock_acquire.part.0+0xa2/0x1d4 >> lock_acquire+0x44/0x5a >> _raw_spin_lock_irqsave+0x3a/0x64 >> __pm_runtime_resume+0x40/0x86 >> __uart_start+0x40/0xb2 >> uart_write+0x90/0x220 >> n_tty_write+0x10a/0x40e >> file_tty_write.constprop.0+0x10c/0x230 >> redirected_tty_write+0x84/0xbc >> do_iter_readv_writev+0x100/0x166 >> vfs_writev+0xc6/0x398 >> do_writev+0x5c/0xca >> __riscv_sys_writev+0x16/0x1e >> do_trap_ecall_u+0x1b6/0x1e2 >> _new_vmalloc_restore_context_a0+0xc2/0xce >> >> -> #1 (&port_lock_key){-.-.}-{2:2}: >> lock_acquire.part.0+0xa2/0x1d4 >> lock_acquire+0x44/0x5a >> _raw_spin_lock_irqsave+0x3a/0x64 >> serial8250_console_write+0x2a0/0x474 >> univ8250_console_write+0x22/0x2a >> console_flush_all+0x2f6/0x3c8 >> console_unlock+0x80/0x1a8 >> vprintk_emit+0x10e/0x2e0 >> vprintk_default+0x16/0x1e >> vprintk+0x1e/0x3c >> _printk+0x36/0x50 >> register_console+0x292/0x418 >> serial_core_register_port+0x6d6/0x6dc >> serial_ctrl_register_port+0xc/0x14 >> uart_add_one_port+0xc/0x14 >> serial8250_register_8250_port+0x288/0x428 >> dw8250_probe+0x422/0x518 >> platform_probe+0x4e/0x92 >> really_probe+0x10a/0x2da >> __driver_probe_device.part.0+0xb2/0xe8 >> driver_probe_device+0x78/0xc4 >> __device_attach_driver+0x66/0xc6 >> bus_for_each_drv+0x5c/0xb0 >> __device_attach+0x84/0x13c >> device_initial_probe+0xe/0x16 >> bus_probe_device+0x88/0x8a >> deferred_probe_work_func+0xd4/0xee >> process_one_work+0x1e0/0x534 >> worker_thread+0x166/0x2cc >> kthread+0xc4/0xe0 >> ret_from_fork+0xe/0x18 >> >> -> #0 (console_owner){-.-.}-{0:0}: >> check_noncircular+0x10e/0x122 >> __lock_acquire+0x105c/0x1f4a >> lock_acquire.part.0+0xa2/0x1d4 >> lock_acquire+0x44/0x5a >> console_lock_spinning_enable+0x58/0x60 >> console_flush_all+0x2cc/0x3c8 >> console_unlock+0x80/0x1a8 >> vprintk_emit+0x10e/0x2e0 >> dev_vprintk_emit+0xea/0x112 >> dev_printk_emit+0x2e/0x48 >> __dev_printk+0x40/0x5c >> _dev_warn+0x46/0x60 >> pm_runtime_enable+0x98/0xb6 >> starfive_pcie_probe+0x12e/0x228 [pcie_starfive] >> platform_probe+0x4e/0x92 >> really_probe+0x10a/0x2da >> __driver_probe_device.part.0+0xb2/0xe8 >> driver_probe_device+0x78/0xc4 >> __driver_attach+0x54/0x162 >> bus_for_each_dev+0x58/0xa4 >> driver_attach+0x1a/0x22 >> bus_add_driver+0xec/0x1ce >> driver_register+0x3e/0xd8 >> __platform_driver_register+0x1c/0x24 >> starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive] >> do_one_initcall+0x5e/0x28c >> do_init_module+0x52/0x1ba >> load_module+0x1440/0x18f0 >> init_module_from_file+0x76/0xae >> idempotent_init_module+0x18c/0x24a >> __riscv_sys_finit_module+0x52/0x82 >> do_trap_ecall_u+0x1b6/0x1e2 >> _new_vmalloc_restore_context_a0+0xc2/0xce >> >> other info that might help us debug this: >> >> Chain exists of: >> console_owner --> &port_lock_key --> &dev->power.lock >> >> Possible unsafe locking scenario: >> >> CPU0 CPU1 >> ---- ---- >> lock(&dev->power.lock); >> lock(&port_lock_key); >> lock(&dev->power.lock); >> lock(console_owner); >> >> *** DEADLOCK *** >> >> 4 locks held by systemd-udevd/159: >> #0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at: >> __driver_attach+0x4c/0x162 >> #1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at: >> pm_runtime_enable+0x1e/0xb6 >> #2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at: >> dev_vprintk_emit+0xea/0x112 >> #3: ffffffff81822448 (console_srcu){....}-{0:0}, at: >> console_flush_all+0x4e/0x3c8 >> >> stack backtrace: >> CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ >> #15438 >> Hardware name: StarFive VisionFive 2 v1.2A (DT) >> Call Trace: >> [<ffffffff80006a02>] dump_backtrace+0x1c/0x24 >> [<ffffffff80b70b3e>] show_stack+0x2c/0x38 >> [<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4 >> [<ffffffff80b7f946>] dump_stack+0x14/0x1c >> [<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350 >> [<ffffffff8007fd76>] check_noncircular+0x10e/0x122 >> [<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a >> [<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4 >> [<ffffffff800842be>] lock_acquire+0x44/0x5a >> [<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60 >> [<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8 >> [<ffffffff8008c23e>] console_unlock+0x80/0x1a8 >> [<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0 >> [<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112 >> [<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48 >> [<ffffffff80b7a006>] __dev_printk+0x40/0x5c >> [<ffffffff80b7a2be>] _dev_warn+0x46/0x60 >> [<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6 >> [<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive] >> [<ffffffff806f83f6>] platform_probe+0x4e/0x92 >> [<ffffffff80b7a680>] really_probe+0x10a/0x2da >> [<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8 >> [<ffffffff806f6112>] driver_probe_device+0x78/0xc4 >> [<ffffffff806f6278>] __driver_attach+0x54/0x162 >> [<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4 >> [<ffffffff806f5c9e>] driver_attach+0x1a/0x22 >> [<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce >> [<ffffffff806f7112>] driver_register+0x3e/0xd8 >> [<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24 >> [<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 >> [pcie_starfive] >> [<ffffffff800027ba>] do_one_initcall+0x5e/0x28c >> [<ffffffff800bb4d8>] do_init_module+0x52/0x1ba >> [<ffffffff800bcc8a>] load_module+0x1440/0x18f0 >> [<ffffffff800bd2f0>] init_module_from_file+0x76/0xae >> [<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a >> [<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82 >> [<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2 >> [<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce >> pcie-starfive 940000000.pcie: driver: 'pcie-starfive': driver_bound: >> bound to device >> /soc/pcie@940000000 Dropping the fwnode link to >> /soc/pcie@940000000/interrupt-controller >> pcie-starfive 940000000.pcie: Dropping the link to 10210000.phy >> device: 'platform:10210000.phy--platform:940000000.pcie': >> device_unregister >> pcie-starfive 940000000.pcie: Dropping the link to >> 10230000.clock-controller >> device: 'platform:10230000.clock-controller--platform:940000000.pcie': >> device_unregister >> pcie-starfive 940000000.pcie: bus: 'platform': really_probe: bound >> device to driver pcie-starfive >> platform 9c0000000.pcie: bus: 'platform': __driver_probe_device: matched >> device with driver pcie-starfive >> platform 9c0000000.pcie: bus: 'platform': really_probe: probing driver >> pcie-starfive with device >> pcie-starfive 9c0000000.pcie: no init pinctrl state >> pcie-starfive 9c0000000.pcie: no sleep pinctrl state >> pcie-starfive 9c0000000.pcie: no idle pinctrl state >> device: 'phy:phy-10220000.phy.1--platform:9c0000000.pcie': device_add >> devices_kset: Moving 9c0000000.pcie to end of list >> PM: Moving platform:9c0000000.pcie to end of list >> pcie-starfive 9c0000000.pcie: Linked as a consumer to phy-10220000.phy.1 >> pcie-starfive 9c0000000.pcie: host bridge /soc/pcie@9c0000000 ranges: >> pcie-starfive 9c0000000.pcie: MEM 0x0038000000..0x003fffffff -> >> 0x0038000000 >> pcie-starfive 9c0000000.pcie: MEM 0x0980000000..0x09bfffffff -> >> 0x0980000000 >> >> --->8--- >> >> >>> --- >>> Changes in v5: >>> - call pm_runtime_no_callbacks() as suggested by Rafael. >>> - include the commit texts as suggested by Rafael. >>> - Link to v4: >>> https://lore.kernel.org/linux-pci/20240708-runtime_pm-v4-1-c02a3663243b@xxxxxxxxxxx/ >>> Changes in v4: >>> - Changed pm_runtime_enable() to devm_pm_runtime_enable() (suggested >>> by mayank) >>> - Link to v3: >>> https://lore.kernel.org/lkml/20240609-runtime_pm-v3-1-3d0460b49d60@xxxxxxxxxxx/ >>> Changes in v3: >>> - Moved the runtime API call's from the dwc driver to PCI framework >>> as it is applicable for all (suggested by mani) >>> - Updated the commit message. >>> - Link to v2: >>> https://lore.kernel.org/all/20240305-runtime_pm_enable-v2-1-a849b74091d1@xxxxxxxxxxx >>> Changes in v2: >>> - Updated commit message as suggested by mani. >>> - Link to v1: >>> https://lore.kernel.org/r/20240219-runtime_pm_enable-v1-1-d39660310504@xxxxxxxxxxx >>> --- >>> >>> --- >>> drivers/pci/probe.c | 5 +++++ >>> 1 file changed, 5 insertions(+) >>> >>> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c >>> index 4f68414c3086..8409e1dde0d1 100644 >>> --- a/drivers/pci/probe.c >>> +++ b/drivers/pci/probe.c >>> @@ -3106,6 +3106,11 @@ int pci_host_probe(struct pci_host_bridge >>> *bridge) >>> pcie_bus_configure_settings(child); >>> pci_bus_add_devices(bus); >>> + >>> + pm_runtime_set_active(&bridge->dev); >>> + pm_runtime_no_callbacks(&bridge->dev); >>> + devm_pm_runtime_enable(&bridge->dev); >>> + >>> return 0; >>> } >>> EXPORT_SYMBOL_GPL(pci_host_probe); >>> >>> --- >>> base-commit: c02d24a5af66a9806922391493205a344749f2c4 >>> change-id: 20241003-runtime_pm-655d48356c8b >>> >>> Best regards, >> >> Best regards > Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland