Re: [PATCH] PCI: starfive: Enable PCIe controller's PM runtime before probing host bridge

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 10, 2024 at 01:29:50PM -0700, Mayank Rana wrote:
> Commit 02787a3b4d10 ("PCI/PM: Enable runtime power management for host
> bridges") enables runtime PM for host bridge enforcing dependency chain
> between PCIe controller, host bridge and endpoint devices. With this,
> Starfive PCIe controller driver's probe enables host bridge (child device)
> PM runtime before parent's PM runtime (Starfive PCIe controller device)
> causing below warning and callstack:

I don't want the bisection hole that would result if we kept
02787a3b4d10 ("PCI/PM: Enable runtime power management for host
bridges") and applied this patch on top of it.

If this is the fix, we'll apply it *first*, followed by 02787a3b4d10
(which will obviously become a different commit), so the locking
problem below described below should never exist in -next or the
upstream tree.

So we need to audit other drivers to make sure they don't have the
same problem as starfive, then make a series with this patch and any
others that need similar fixes, followed by the "PCI/PM: Enable
runtime power management for host bridges" patch.

Presumably this patch fixes a problem all by itself, even without
considering 02787a3b4d10, so the commit message should describe that
problem.

> pcie-starfive 940000000.pcie: Enabling runtime PM for inactive device
> with active children
> 
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.12.0-rc1+ #15438 Not tainted
> ------------------------------------------------------
> systemd-udevd/159 is trying to acquire lock:
> ffffffff81822520 (console_owner){-.-.}-{0:0}, at:
> console_lock_spinning_enable+0x3a/0x60
> 
> but task is already holding lock:
> ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
> pm_runtime_enable+0x1e/0xb6
> 
> which lock already depends on the new lock.
> 
> the existing dependency chain (in reverse order) is:
> 
> -> #2 (&dev->power.lock){-...}-{2:2}:
>         lock_acquire.part.0+0xa2/0x1d4
>         lock_acquire+0x44/0x5a
>         _raw_spin_lock_irqsave+0x3a/0x64
>         __pm_runtime_resume+0x40/0x86
>         __uart_start+0x40/0xb2
>         uart_write+0x90/0x220
>         n_tty_write+0x10a/0x40e
>         file_tty_write.constprop.0+0x10c/0x230
>         redirected_tty_write+0x84/0xbc
>         do_iter_readv_writev+0x100/0x166
>         vfs_writev+0xc6/0x398
>         do_writev+0x5c/0xca
>         __riscv_sys_writev+0x16/0x1e
>         do_trap_ecall_u+0x1b6/0x1e2
>         _new_vmalloc_restore_context_a0+0xc2/0xce
> 
> -> #1 (&port_lock_key){-.-.}-{2:2}:
>         lock_acquire.part.0+0xa2/0x1d4
>         lock_acquire+0x44/0x5a
>         _raw_spin_lock_irqsave+0x3a/0x64
>         serial8250_console_write+0x2a0/0x474
>         univ8250_console_write+0x22/0x2a
>         console_flush_all+0x2f6/0x3c8
>         console_unlock+0x80/0x1a8
>         vprintk_emit+0x10e/0x2e0
>         vprintk_default+0x16/0x1e
>         vprintk+0x1e/0x3c
>         _printk+0x36/0x50
>         register_console+0x292/0x418
>         serial_core_register_port+0x6d6/0x6dc
>         serial_ctrl_register_port+0xc/0x14
>         uart_add_one_port+0xc/0x14
>         serial8250_register_8250_port+0x288/0x428
>         dw8250_probe+0x422/0x518
>         platform_probe+0x4e/0x92
>         really_probe+0x10a/0x2da
>         __driver_probe_device.part.0+0xb2/0xe8
>         driver_probe_device+0x78/0xc4
>         __device_attach_driver+0x66/0xc6
>         bus_for_each_drv+0x5c/0xb0
>         __device_attach+0x84/0x13c
>         device_initial_probe+0xe/0x16
>         bus_probe_device+0x88/0x8a
>         deferred_probe_work_func+0xd4/0xee
>         process_one_work+0x1e0/0x534
>         worker_thread+0x166/0x2cc
>         kthread+0xc4/0xe0
>         ret_from_fork+0xe/0x18
> 
> -> #0 (console_owner){-.-.}-{0:0}:
>         check_noncircular+0x10e/0x122
>         __lock_acquire+0x105c/0x1f4a
>         lock_acquire.part.0+0xa2/0x1d4
>         lock_acquire+0x44/0x5a
>         console_lock_spinning_enable+0x58/0x60
>         console_flush_all+0x2cc/0x3c8
>         console_unlock+0x80/0x1a8
>         vprintk_emit+0x10e/0x2e0
>         dev_vprintk_emit+0xea/0x112
>         dev_printk_emit+0x2e/0x48
>         __dev_printk+0x40/0x5c
>         _dev_warn+0x46/0x60
>         pm_runtime_enable+0x98/0xb6
>         starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
>         platform_probe+0x4e/0x92
>         really_probe+0x10a/0x2da
>         __driver_probe_device.part.0+0xb2/0xe8
>         driver_probe_device+0x78/0xc4
>         __driver_attach+0x54/0x162
>         bus_for_each_dev+0x58/0xa4
>         driver_attach+0x1a/0x22
>         bus_add_driver+0xec/0x1ce
>         driver_register+0x3e/0xd8
>         __platform_driver_register+0x1c/0x24
>         starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
>         do_one_initcall+0x5e/0x28c
>         do_init_module+0x52/0x1ba
>         load_module+0x1440/0x18f0
>         init_module_from_file+0x76/0xae
>         idempotent_init_module+0x18c/0x24a
>         __riscv_sys_finit_module+0x52/0x82
>         do_trap_ecall_u+0x1b6/0x1e2
>         _new_vmalloc_restore_context_a0+0xc2/0xce
> 
> other info that might help us debug this:
> 
> Chain exists of:
>    console_owner --> &port_lock_key --> &dev->power.lock
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(&dev->power.lock);
>                                 lock(&port_lock_key);
>                                 lock(&dev->power.lock);
>    lock(console_owner);
> 
>   *** DEADLOCK ***
> 
> 4 locks held by systemd-udevd/159:
>   #0: ffffffd6c0b3d8f8 (&dev->mutex){....}-{3:3}, at:
> __driver_attach+0x4c/0x162
>   #1: ffffffd6c0b3d980 (&dev->power.lock){-...}-{2:2}, at:
> pm_runtime_enable+0x1e/0xb6
>   #2: ffffffff818223b0 (console_lock){+.+.}-{0:0}, at:
> dev_vprintk_emit+0xea/0x112
>   #3: ffffffff81822448 (console_srcu){....}-{0:0}, at:
> console_flush_all+0x4e/0x3c8
> 
> stack backtrace:
> CPU: 1 UID: 0 PID: 159 Comm: systemd-udevd Not tainted 6.12.0-rc1+ #15438
> Hardware name: StarFive VisionFive 2 v1.2A (DT)
> Call Trace:
> [<ffffffff80006a02>] dump_backtrace+0x1c/0x24
> [<ffffffff80b70b3e>] show_stack+0x2c/0x38
> [<ffffffff80b7f8f8>] dump_stack_lvl+0x7a/0xb4
> [<ffffffff80b7f946>] dump_stack+0x14/0x1c
> [<ffffffff8007fbc2>] print_circular_bug+0x2aa/0x350
> [<ffffffff8007fd76>] check_noncircular+0x10e/0x122
> [<ffffffff80082a3c>] __lock_acquire+0x105c/0x1f4a
> [<ffffffff80084148>] lock_acquire.part.0+0xa2/0x1d4
> [<ffffffff800842be>] lock_acquire+0x44/0x5a
> [<ffffffff8008b3e8>] console_lock_spinning_enable+0x58/0x60
> [<ffffffff8008c0c2>] console_flush_all+0x2cc/0x3c8
> [<ffffffff8008c23e>] console_unlock+0x80/0x1a8
> [<ffffffff8008c710>] vprintk_emit+0x10e/0x2e0
> [<ffffffff80b79ec8>] dev_vprintk_emit+0xea/0x112
> [<ffffffff80b79f1e>] dev_printk_emit+0x2e/0x48
> [<ffffffff80b7a006>] __dev_printk+0x40/0x5c
> [<ffffffff80b7a2be>] _dev_warn+0x46/0x60
> [<ffffffff807037ae>] pm_runtime_enable+0x98/0xb6
> [<ffffffff02763240>] starfive_pcie_probe+0x12e/0x228 [pcie_starfive]
> [<ffffffff806f83f6>] platform_probe+0x4e/0x92
> [<ffffffff80b7a680>] really_probe+0x10a/0x2da
> [<ffffffff80b7a902>] __driver_probe_device.part.0+0xb2/0xe8
> [<ffffffff806f6112>] driver_probe_device+0x78/0xc4
> [<ffffffff806f6278>] __driver_attach+0x54/0x162
> [<ffffffff806f42e6>] bus_for_each_dev+0x58/0xa4
> [<ffffffff806f5c9e>] driver_attach+0x1a/0x22
> [<ffffffff806f54ce>] bus_add_driver+0xec/0x1ce
> [<ffffffff806f7112>] driver_register+0x3e/0xd8
> [<ffffffff806f80cc>] __platform_driver_register+0x1c/0x24
> [<ffffffff027ea020>] starfive_pcie_driver_init+0x20/0x1000 [pcie_starfive]
> [<ffffffff800027ba>] do_one_initcall+0x5e/0x28c
> [<ffffffff800bb4d8>] do_init_module+0x52/0x1ba
> [<ffffffff800bcc8a>] load_module+0x1440/0x18f0
> [<ffffffff800bd2f0>] init_module_from_file+0x76/0xae
> [<ffffffff800bd4b4>] idempotent_init_module+0x18c/0x24a
> [<ffffffff800bd5fc>] __riscv_sys_finit_module+0x52/0x82
> [<ffffffff80b804a0>] do_trap_ecall_u+0x1b6/0x1e2
> [<ffffffff80b8c536>] _new_vmalloc_restore_context_a0+0xc2/0xce
> 
> Fix this issue by enabling starfive pcie controller device's PM runtime
> status before calling into pci_host_probe() through plda_pcie_host_init().
> 
> Reported-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Fixes: 02787a3b4d10 ("PCI/PM: Enable runtime power management for host bridges")
> Tested-by: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Signed-off-by: Mayank Rana <quic_mrana@xxxxxxxxxxx>
> ---
>  drivers/pci/controller/plda/pcie-starfive.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/plda/pcie-starfive.c b/drivers/pci/controller/plda/pcie-starfive.c
> index 0567ec373a3e..e73c1b7bc8ef 100644
> --- a/drivers/pci/controller/plda/pcie-starfive.c
> +++ b/drivers/pci/controller/plda/pcie-starfive.c
> @@ -404,6 +404,9 @@ static int starfive_pcie_probe(struct platform_device *pdev)
>  	if (ret)
>  		return ret;
>  
> +	pm_runtime_enable(&pdev->dev);
> +	pm_runtime_get_sync(&pdev->dev);
> +
>  	plda->host_ops = &sf_host_ops;
>  	plda->num_events = PLDA_MAX_EVENT_NUM;
>  	/* mask doorbell event */
> @@ -413,11 +416,12 @@ static int starfive_pcie_probe(struct platform_device *pdev)
>  	plda->events_bitmap <<= PLDA_NUM_DMA_EVENTS;
>  	ret = plda_pcie_host_init(&pcie->plda, &starfive_pcie_ops,
>  				  &stf_pcie_event);
> -	if (ret)
> +	if (ret) {
> +		pm_runtime_put_sync(&pdev->dev);
> +		pm_runtime_disable(&pdev->dev);
>  		return ret;
> +	}
>  
> -	pm_runtime_enable(&pdev->dev);
> -	pm_runtime_get_sync(&pdev->dev);
>  	platform_set_drvdata(pdev, pcie);
>  
>  	return 0;
> -- 
> 2.25.1
> 




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux