Hello, I have one question about the process of patch/fixes backporting to old stable kernel. Assuming the following patch will be deemed correct and applied it will have to be eventually backported. This patch will apply cleanly on v5.15+, but some work is needed to backport to v5.10 or older. Should I send a patch for the older kernels once the fix is merged? Reading Documentation/process/stable-kernel-rules.rst it was not clear to me what to do in this "mixed" situations. Thanks a lot, Francesco On Fri, May 06, 2022 at 08:08:15AM +0200, Francesco Dolcini wrote: > This fixes the following error caused by a race condition between > phydev->adjust_link() and a MDIO transaction in the phy interrupt > handler. The issue was reproduced with the ethernet FEC driver and a > micrel KSZ9031 phy. > > [ 146.195696] fec 2188000.ethernet eth0: MDIO read timeout > [ 146.201779] ------------[ cut here ]------------ > [ 146.206671] WARNING: CPU: 0 PID: 571 at drivers/net/phy/phy.c:942 phy_error+0x24/0x6c > [ 146.214744] Modules linked in: bnep imx_vdoa imx_sdma evbug > [ 146.220640] CPU: 0 PID: 571 Comm: irq/128-2188000 Not tainted 5.18.0-rc3-00080-gd569e86915b7 #9 > [ 146.229563] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 146.236257] unwind_backtrace from show_stack+0x10/0x14 > [ 146.241640] show_stack from dump_stack_lvl+0x58/0x70 > [ 146.246841] dump_stack_lvl from __warn+0xb4/0x24c > [ 146.251772] __warn from warn_slowpath_fmt+0x5c/0xd4 > [ 146.256873] warn_slowpath_fmt from phy_error+0x24/0x6c > [ 146.262249] phy_error from kszphy_handle_interrupt+0x40/0x48 > [ 146.268159] kszphy_handle_interrupt from irq_thread_fn+0x1c/0x78 > [ 146.274417] irq_thread_fn from irq_thread+0xf0/0x1dc > [ 146.279605] irq_thread from kthread+0xe4/0x104 > [ 146.284267] kthread from ret_from_fork+0x14/0x28 > [ 146.289164] Exception stack(0xe6fa1fb0 to 0xe6fa1ff8) > [ 146.294448] 1fa0: 00000000 00000000 00000000 00000000 > [ 146.302842] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 > [ 146.311281] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > [ 146.318262] irq event stamp: 12325 > [ 146.321780] hardirqs last enabled at (12333): [<c01984c4>] __up_console_sem+0x50/0x60 > [ 146.330013] hardirqs last disabled at (12342): [<c01984b0>] __up_console_sem+0x3c/0x60 > [ 146.338259] softirqs last enabled at (12324): [<c01017f0>] __do_softirq+0x2c0/0x624 > [ 146.346311] softirqs last disabled at (12319): [<c01300ac>] __irq_exit_rcu+0x138/0x178 > [ 146.354447] ---[ end trace 0000000000000000 ]--- > > With the FEC driver phydev->adjust_link() calls fec_enet_adjust_link() > calls fec_stop()/fec_restart() and both these function reset and > temporary disable the FEC disrupting any MII transaction that > could be happening at the same time. > > fec_enet_adjust_link() and phy_read() can be running at the same time > when we have one additional interrupt before the phy_state_machine() is > able to terminate. > > Thread 1 (phylib WQ) | Thread 2 (phy interrupt) > | > | phy_interrupt() <-- PHY IRQ > | handle_interrupt() > | phy_read() > | phy_trigger_machine() > | --> schedule phylib WQ > | > | > phy_state_machine() | > phy_check_link_status() | > phy_link_change() | > phydev->adjust_link() | > fec_enet_adjust_link() | > --> FEC reset | phy_interrupt() <-- PHY IRQ > | phy_read() > | > > Fix this by acquiring the phydev lock in phy_interrupt(). > > Link: https://lore.kernel.org/all/20220422152612.GA510015@xxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > Fixes: c974bdbc3e77 ("net: phy: Use threaded IRQ, to allow IRQ from sleeping devices") > cc: <stable@xxxxxxxxxxxxxxx> > Signed-off-by: Francesco Dolcini <francesco.dolcini@xxxxxxxxxxx> > --- > drivers/net/phy/phy.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c > index beb2b66da132..f122026c4682 100644 > --- a/drivers/net/phy/phy.c > +++ b/drivers/net/phy/phy.c > @@ -970,8 +970,13 @@ static irqreturn_t phy_interrupt(int irq, void *phy_dat) > { > struct phy_device *phydev = phy_dat; > struct phy_driver *drv = phydev->drv; > + irqreturn_t ret; > > - return drv->handle_interrupt(phydev); > + mutex_lock(&phydev->lock); > + ret = drv->handle_interrupt(phydev); > + mutex_unlock(&phydev->lock); > + > + return ret; > } > > /** > -- > 2.25.1 >