On Mon, Jan 04, 2021 at 04:11:02PM +0100, Geert Uytterhoeven wrote: > Hi Ioana, > > On Mon, Jan 4, 2021 at 3:53 PM Ioana Ciornei <ioana.ciornei@xxxxxxx> wrote: > > On Mon, Jan 04, 2021 at 01:24:15PM +0100, Geert Uytterhoeven wrote: > > > Wolfram reports that his R-Car H2-based Lager board can no longer be > > > rebooted in v5.11-rc1, as it crashes with an imprecise external abort. > > > The issue can be reproduced on other boards (e.g. Koelsch with R-Car > > > M2-W) too, if CONFIG_IP_PNP is disabled: > > > > What kind of PHYs are used on these boards? > > Micrel KSZ8041RNLI > > > > Unhandled fault: imprecise external abort (0x1406) at 0x00000000 > > > pgd = (ptrval) > > > [00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000 > > > Internal error: : 1406 [#1] ARM > > > Modules linked in: > > > CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048 > > > Hardware name: Generic R-Car Gen2 (Flattened Device Tree) > > > PC is at sh_mdio_ctrl+0x44/0x60 > > > LR is at sh_mmd_ctrl+0x20/0x24 > > > ... > > > Backtrace: > > > [<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24) > > > r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4 > > > [<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8) > > > [<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc) > > > r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001 > > > [<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0) > > > r7:0000001f r6:00000001 r5:c221e000 r4:c221e000 > > > [<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54) > > > r7:0000001f r6:00000001 r5:c221e000 r4:c221e458 > > > [<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20) > > > r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800 > > > [<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80) > > > [<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50) > > > r5:c229f800 r4:c229f800 > > > [<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c) > > > r5:c229f800 r4:c229f804 > > > [<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8) > > > [<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48) > > > r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000 > > > [<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60) > > > [<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208) > > > r5:4321fedc r4:01234567 > > > [<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c) > > > r7:00000058 r6:00000000 r5:00000000 r4:00000000 > > > [<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54) > > > > > > Calling phy_disable_interrupts() unconditionally means that the PHY > > > registers may be accessed while the device is suspended, causing > > > undefined behavior, which may crash the system. > > > > > > Fix this by calling phy_disable_interrupts() only when the PHY has been > > > started. > > > > > > Reported-by: Wolfram Sang <wsa+renesas@xxxxxxxxxxxxxxxxxxxx> > > > Fixes: e2f016cf775129c0 ("net: phy: add a shutdown procedure") > > > Signed-off-by: Geert Uytterhoeven <geert+renesas@xxxxxxxxx> > > > --- > > > Marked RFC as I do not know if this change breaks the use case fixed by > > > the faulty commit. > > > > I haven't tested it yet but most probably this change would partially > > revert the behavior to how things were before adding the shutdown > > procedure. > > > > And this is because the interrupts are enabled at phy_connect and not at > > phy_start so we would want to disable any PHY interrupts even though the > > PHY has not been started yet. > > Makes sense. > > > > Alternatively, the device may have to be started > > > explicitly first. > > > > Have you actually tried this out and it worked? > > No, I haven't tested restarting the device first. > I would like to avoid starting the device during shutdown, unless it is > absolutely necessary. I was talking about starting the PHY device but in light of the new information, this would lead to the exact same crash since it's just another PHY register access. Now I understand that you were referring to the sh_eth device itself. > > > I am asking this because I would much rather expect this to be a problem > > with how the sh_eth driver behaves if the netdevice did not connect to > > the PHY (this is done in .open() alongside the phy_start()) and it > > suddently has to interract with it through the mdiobb_ops callbacks. > > > > Also, I just re-tested this use case in which I do not start the > > interface and just issue a reboot, and it behaves as expected. > > It depends on the hardware: the sh_eth device is powered down when its > module clock is stopped. When powered down, any access to the sh_eth > registers or to the PHY connected to it will cause a crash. > > On most other hardware, you can access the PHY regardless, and no crash > will happen. Ok, so this does not have anything to do with interrupts explicitly but rather with the fact that any PHY access will cause a crash when the sh_eth device is powered down. If the device is powered-down before the actual .ndo_open() how is the probe actually setting up the device? Or is the device returned to the powered-down state after the probe and only powered-up at a subsequent .ndo_open()? Instead of the phy_is_started() call we could check if we had previously enabled the interrupts on the PHY but this would mean that a basic assumption of the PHY library is violated in that a registered PHY device cannot access its regiters because the MDIO controller just decided so. Can't the MDIO bitbang driver callbacks just check if the device is powered-down and if it is just power it back up temporarily?