Re: [PATCH 1/2] PCI: mediatek-gen3: Stop acquiring spinlocks in {suspend,resume}_noirq

Oliver Neukum <oneukum@xxxxxxxx> · Mon, 8 May 2023 09:44:13 +0200

On 04.05.23 13:35, AngeloGioacchino Del Regno wrote:

Hi,

looking at your patch I am afraid there is an issue.

In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are,
respectively, disabling and enabling generation of interrupts and
then saving and restoring the enabled interrupts register: since
we're using noirq PM callbacks, that can be safely done without
holding any spin lock.

Why? You can still race with another CPU in task context.
That is if you say that you do not need locking to touch
PCIE_INT_ENABLE_REG that is fine, but then why do you remove
it from one place only?
It is also touched in mtk_pcie_probe() at a minimum.

That was noticed because of, and solves, the following issue:

<4>[   74.185982] ========================================================
<4>[   74.192629] WARNING: possible irq lock inversion dependency detected
<4>[   74.199276] 6.3.0-next-20230428+ #51 Tainted: G        W
<4>[   74.205664] --------------------------------------------------------
<4>[   74.212309] systemd-sleep/809 just changed the state of lock:
<4>[   74.218347] ffff65a5c34c65a0 (&pcie->irq_lock){+...}-{2:2}, at: mtk_pcie_resume+0x50/0xa8
<4>[   74.226870] but this lock was taken by another, HARDIRQ-safe lock in the past:
<4>[   74.234389]  (&irq_desc_lock_class){-.-.}-{2:2}
<4>[   74.234409]
<4>[   74.234409]
<4>[   74.234409] and interrupts could create inverse lock ordering between them.
<4>[   74.234409]
<4>[   74.251704]
<4>[   74.251704] other info that might help us debug this:
<4>[   74.258785]  Possible interrupt unsafe locking scenario:
<4>[   74.258785]
<4>[   74.266126]        CPU0                    CPU1
<4>[   74.270942]        ----                    ----
<4>[   74.275758]   lock(&pcie->irq_lock);

Lock A

<4>[   74.279627]                                local_irq_disable();

strictly speaking irrelevant

<4>[   74.285836]                                lock(&irq_desc_lock_class);

lock B

<4>[   74.292667]                                lock(&pcie->irq_lock);

lock A

<4>[   74.299061]   <Interrupt>

You do not need that interrupt.

<4>[   74.301960]     lock(&irq_desc_lock_class);

lock B

<4>[   74.306438]
<4>[   74.306438]  *** DEADLOCK ***