Re: [PATCH] PCI: starfive: Enable PCIe controller's PM runtime before probing host bridge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Bjorn

On 10/10/2024 1:52 PM, Bjorn Helgaas wrote:
On Thu, Oct 10, 2024 at 01:29:50PM -0700, Mayank Rana wrote:
Commit 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") enables runtime PM for host bridge enforcing dependency chain
between PCIe controller, host bridge and endpoint devices. With this,
Starfive PCIe controller driver's probe enables host bridge (child device)
PM runtime before parent's PM runtime (Starfive PCIe controller device)
causing below warning and callstack:

I don't want the bisection hole that would result if we kept
02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") and applied this patch on top of it.

If this is the fix, we'll apply it *first*, followed by 02787a3b4d10
(which will obviously become a different commit), so the locking
problem below described below should never exist in -next or the
upstream tree.

So we need to audit other drivers to make sure they don't have the
same problem as starfive, then make a series with this patch and any
others that need similar fixes, followed by the "PCI/PM: Enable
runtime power management for host bridges" patch.

Presumably this patch fixes a problem all by itself, even without
considering 02787a3b4d10, so the commit message should describe that
problem.

Ok. Thank you for quick response and above suggestion.
I shall reword this patch commit text, and resend it.

Regards,
Mayank
pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device
with active children

======================================================
WARNING: possible circular locking dependency detected
6.12.0-rc1+ #15438 Not tainted
------------------------------------------------------
systemd-udevd/159 is trying to acquire lock:
ffffffff81822520 (console_owner){-.-.}-{0:0}, at:
console_lock_spinning_enable+0x3a/0x60

but task is already holding lock:
ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
pm_runtime_enable+0x1e/0xb6

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #2 (&dev->power.lock){-...}-{2:2}:
         lock_acquire.part.0+0xa2/0x1d4
         lock_acquire+0x44/0x5a
         _raw_spin_lock_irqsave+0x3a/0x64
         __pm_runtime_resume+0x40/0x86
         __uart_start+0x40/0xb2
         uart_write+0x90/0x220
         n_tty_write+0x10a/0x40e
         file_tty_write.constprop.0+0x10c/0x230
         redirected_tty_write+0x84/0xbc
         do_iter_readv_writev+0x100/0x166
         vfs_writev+0xc6/0x398
         do_writev+0x5c/0xca
         __riscv_sys_writev+0x16/0x1e
         do_trap_ecall_u+0x1b6/0x1e2
         _new_vmalloc_restore_context_a0+0xc2/0xce

-> #1 (&port_lock_key){-.-.}-{2:2}:
         lock_acquire.part.0+0xa2/0x1d4
         lock_acquire+0x44/0x5a
         _raw_spin_lock_irqsave+0x3a/0x64
         serial8250_console_write+0x2a0/0x474
         univ8250_console_write+0x22/0x2a
         console_flush_all+0x2f6/0x3c8
         console_unlock+0x80/0x1a8
         vprintk_emit+0x10e/0x2e0
         vprintk_default+0x16/0x1e
         vprintk+0x1e/0x3c
         _printk+0x36/0x50
         register_console+0x292/0x418
         serial_core_register_port+0x6d6/0x6dc
         serial_ctrl_register_port+0xc/0x14
         uart_add_one_port+0xc/0x14
         serial8250_register_8250_port+0x288/0x428
         dw8250_probe+0x422/0x518
         platform_probe+0x4e/0x92
         really_probe+0x10a/0x2da
         __driver_probe_device.part.0+0xb2/0xe8
         driver_probe_device+0x78/0xc4
         __device_attach_driver+0x66/0xc6
         bus_for_each_drv+0x5c/0xb0
         __device_attach+0x84/0x13c
         device_initial_probe+0xe/0x16
         bus_probe_device+0x88/0x8a
         deferred_probe_work_func+0xd4/0xee
         process_one_work+0x1e0/0x534
         worker_thread+0x166/0x2cc
         kthread+0xc4/0xe0
         ret_from_fork+0xe/0x18

-> #0 (console_owner){-.-.}-{0:0}:
         check_noncircular+0x10e/0x122
         __lock_acquire+0x105c/0x1f4a
         lock_acquire.part.0+0xa2/0x1d4
         lock_acquire+0x44/0x5a
         console_lock_spinning_enable+0x58/0x60
         console_flush_all+0x2cc/0x3c8
         console_unlock+0x80/0x1a8
         vprintk_emit+0x10e/0x2e0
         dev_vprintk_emit+0xea/0x112
         dev_printk_emit+0x2e/0x48
         __dev_printk+0x40/0x5c
         _dev_warn+0x46/0x60
         pm_runtime_enable+0x98/0xb6
         starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
         platform_probe+0x4e/0x92
         really_probe+0x10a/0x2da
         __driver_probe_device.part.0+0xb2/0xe8
         driver_probe_device+0x78/0xc4
         __driver_attach+0x54/0x162
         bus_for_each_dev+0x58/0xa4
         driver_attach+0x1a/0x22
         bus_add_driver+0xec/0x1ce
         driver_register+0x3e/0xd8
         __platform_driver_register+0x1c/0x24
         starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
         do_one_initcall+0x5e/0x28c
         do_init_module+0x52/0x1ba
         load_module+0x1440/0x18f0
         init_module_from_file+0x76/0xae
         idempotent_init_module+0x18c/0x24a
         __riscv_sys_finit_module+0x52/0x82
         do_trap_ecall_u+0x1b6/0x1e2
         _new_vmalloc_restore_context_a0+0xc2/0xce

other info that might help us debug this:

Chain exists of:
    console_owner --> &port_lock_key --> &dev->power.lock

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(&dev->power.lock);
                                 lock(&port_lock_key);
                                 lock(&dev->power.lock);
    lock(console_owner);

   *** DEADLOCK ***

4 locks held by systemd-udevd/159:
   #0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at:
__driver_attach+0x4c/0x162
   #1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
pm_runtime_enable+0x1e/0xb6
   #2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at:
dev_vprintk_emit+0xea/0x112
   #3: ffffffff81822448 (console_srcu){....}-{0:0}, at:
console_flush_all+0x4e/0x3c8

stack backtrace:
CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #15438
Hardware name: StarFive VisionFive 2 v1.2A (DT)
Call Trace:
[<ffffffff80006a02>] dump_backtrace+0x1c/0x24
[<ffffffff80b70b3e>] show_stack+0x2c/0x38
[<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4
[<ffffffff80b7f946>] dump_stack+0x14/0x1c
[<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350
[<ffffffff8007fd76>] check_noncircular+0x10e/0x122
[<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a
[<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4
[<ffffffff800842be>] lock_acquire+0x44/0x5a
[<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60
[<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8
[<ffffffff8008c23e>] console_unlock+0x80/0x1a8
[<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0
[<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112
[<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48
[<ffffffff80b7a006>] __dev_printk+0x40/0x5c
[<ffffffff80b7a2be>] _dev_warn+0x46/0x60
[<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6
[<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
[<ffffffff806f83f6>] platform_probe+0x4e/0x92
[<ffffffff80b7a680>] really_probe+0x10a/0x2da
[<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8
[<ffffffff806f6112>] driver_probe_device+0x78/0xc4
[<ffffffff806f6278>] __driver_attach+0x54/0x162
[<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4
[<ffffffff806f5c9e>] driver_attach+0x1a/0x22
[<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce
[<ffffffff806f7112>] driver_register+0x3e/0xd8
[<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24
[<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
[<ffffffff800027ba>] do_one_initcall+0x5e/0x28c
[<ffffffff800bb4d8>] do_init_module+0x52/0x1ba
[<ffffffff800bcc8a>] load_module+0x1440/0x18f0
[<ffffffff800bd2f0>] init_module_from_file+0x76/0xae
[<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a
[<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82
[<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2
[<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce

Fix this issue by enabling starfive pcie controller device's PM runtime
status before calling into pci_host_probe() through plda_pcie_host_init().

Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Fixes: 02787a3b4d10 ("PCI/PM: Enable runtime power management for host bridges")
Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
Signed-off-by: Mayank Rana <quic_mrana@xxxxxxxxxxx>
---
  drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++---
  1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c
index 0567ec373a3e..e73c1b7bc8ef 100644
--- a/drivers/pci/controller/plda/pcie-starfive.c
+++ b/drivers/pci/controller/plda/pcie-starfive.c
@@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev)
  	if (ret)
  		return ret;
+ pm_runtime_enable(&pdev->dev);
+	pm_runtime_get_sync(&pdev->dev);
+
  	plda->host_ops = &sf_host_ops;
  	plda->num_events = PLDA_MAX_EVENT_NUM;
  	/* mask doorbell event */
@@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev)
  	plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS;
  	ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops,
  				  &stf_pcie_event);
-	if (ret)
+	if (ret) {
+		pm_runtime_put_sync(&pdev->dev);
+		pm_runtime_disable(&pdev->dev);
  		return ret;
+	}
- pm_runtime_enable(&pdev->dev);
-	pm_runtime_get_sync(&pdev->dev);
  	platform_set_drvdata(pdev, pcie);
return 0;
--
2.25.1





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux