On 04.05.23 13:35, AngeloGioacchino Del Regno wrote: Hi, looking at your patch I am afraid there is an issue.
In mtk_pcie_suspend_noirq() and mtk_pcie_resume_noirq() we are, respectively, disabling and enabling generation of interrupts and then saving and restoring the enabled interrupts register: since we're using noirq PM callbacks, that can be safely done without holding any spin lock.
Why? You can still race with another CPU in task context. That is if you say that you do not need locking to touch PCIE_INT_ENABLE_REG that is fine, but then why do you remove it from one place only? It is also touched in mtk_pcie_probe() at a minimum.
That was noticed because of, and solves, the following issue: <4>[ 74.185982] ======================================================== <4>[ 74.192629] WARNING: possible irq lock inversion dependency detected <4>[ 74.199276] 6.3.0-next-20230428+ #51 Tainted: G W <4>[ 74.205664] -------------------------------------------------------- <4>[ 74.212309] systemd-sleep/809 just changed the state of lock: <4>[ 74.218347] ffff65a5c34c65a0 (&pcie->irq_lock){+...}-{2:2}, at: mtk_pcie_resume+0x50/0xa8 <4>[ 74.226870] but this lock was taken by another, HARDIRQ-safe lock in the past: <4>[ 74.234389] (&irq_desc_lock_class){-.-.}-{2:2} <4>[ 74.234409] <4>[ 74.234409] <4>[ 74.234409] and interrupts could create inverse lock ordering between them. <4>[ 74.234409] <4>[ 74.251704] <4>[ 74.251704] other info that might help us debug this: <4>[ 74.258785] Possible interrupt unsafe locking scenario: <4>[ 74.258785] <4>[ 74.266126] CPU0 CPU1 <4>[ 74.270942] ---- ---- <4>[ 74.275758] lock(&pcie->irq_lock);
Lock A
<4>[ 74.279627] local_irq_disable();
strictly speaking irrelevant
<4>[ 74.285836] lock(&irq_desc_lock_class);
lock B
<4>[ 74.292667] lock(&pcie->irq_lock);
lock A
<4>[ 74.299061] <Interrupt>
You do not need that interrupt.
<4>[ 74.301960] lock(&irq_desc_lock_class);
lock B
<4>[ 74.306438] <4>[ 74.306438] *** DEADLOCK ***