Hi Vladimir El vie, 17 mar 2023 a las 14:04, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió: > > On Fri, Mar 17, 2023 at 01:06:43PM +0100, Álvaro Fernández Rojas wrote: > > Hi Vladimir, > > > > El vie, 17 mar 2023 a las 12:51, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió: > > > > > > On Fri, Mar 17, 2023 at 12:34:26PM +0100, Álvaro Fernández Rojas wrote: > > > > b53 MMAP devices have a MDIO Mux bus controller that must be registered after > > > > properly initializing the switch. If the MDIO Mux controller is registered > > > > from a separate driver and the device has an external switch present, it will > > > > cause a race condition which will hang the device. > > > > > > Could you describe the race in more details? Why does it hang the device? > > > > I didn't perform a full analysis on the problem, but what I think is > > going on is that both b53 switches are probed and both of them fail > > due to the ethernet device not being probed yet. > > At some point, the internal switch is reset and not fully configured > > and the external switch is probed again, but since the internal switch > > isn't ready, the MDIO accesses for the external switch fail due to the > > internal switch not being ready and this hangs the device because the > > access to the external switch is done through the same registers from > > the internal switch. > > The proposed solution is too radical for a problem that was not properly > characterized yet, so this patch set has my temporary NACK. Forgive me, but why do you consider this solution too radical? > > > But maybe Florian or Jonas can give some more details about the issue... > > I think you also have the tools necessary to investigate this further. > We need to know what resource belonging to the switch is it that the > MDIO mux needs. Where is the earliest place you can add the call to > b53_mmap_mdiomux_init() such that your board works reliably? Note that > b53_switch_register() indirectly calls b53_setup(). By placing this > function where you have, the entirety of b53_setup() has finished > execution, and we don't know exactly what is it from there that is > needed. In the following link you will find different bootlogs related to different scenarios all of them with the same result: any attempt of calling b53_mmap_mdiomux_init() earlier than b53_switch_register() will either result in a kernel panic or a device hang: https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7 1. before b53_switch_register(): [ 1.756010] bcm53xx 0.1:1e: found switch: BCM53125, rev 4 [ 1.761917] bcm53xx 0.1:1e: failed to register switch: -517 [ 1.767759] b53-switch 10e00000.switch: MDIO mux bus init [ 1.774237] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 [ 1.785673] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found [ 1.795932] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12 [ 1.884320] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog [ 1.901957] NET: Registered PF_INET6 protocol family [ 1.935223] Segment Routing with IPv6 [ 1.939160] In-situ OAM (IOAM) with IPv6 [ 1.943514] NET: Registered PF_PACKET protocol family [ 1.949564] 8021q: 802.1Q VLAN Support v1.8 [ 1.987591] CPU 1 Unable to handle kernel paging request at virtual address 00000000, epc == 804be000, ra == 804bbf3c [ 1.998697] Oops[#1]: [ 2.000995] CPU: 1 PID: 91 Comm: kworker/u4:3 Not tainted 5.15.98 #0 [ 2.007533] Workqueue: events_unbound deferred_probe_work_func [ 2.013541] $ 0 : 00000000 00000001 804bdfd4 81ee6800 [ 2.018916] $ 4 : 834c7000 00000000 00000002 00000001 [ 2.024291] $ 8 : c0000000 00000110 00000114 00000000 [ 2.029668] $12 : 00000001 81cf2f8a fffffffc 00000000 [ 2.035043] $16 : 00000000 00000000 00000002 834bc680 [ 2.040420] $20 : 00000000 00000080 81c0700d 81f37a40 [ 2.045796] $24 : 00000018 00000000 [ 2.051171] $28 : 81f58000 81f59c80 80870000 804bbf3c [ 2.056547] Hi : e6545baf [ 2.059505] Lo : a4644567 [ 2.062462] epc : 804be000 mdio_mux_read+0x2c/0xd4 [ 2.067569] ra : 804bbf3c __mdiobus_read+0x20/0xc4 [ 2.072766] Status: 10008b03 KERNEL EXL IE [ 2.077066] Cause : 00800008 (ExcCode 02) [ 2.081187] BadVA : 00000000 [ 2.084145] PrId : 0002a070 (Broadcom BMIPS4350) [ 2.088983] Modules linked in: [ 2.092119] Process kworker/u4:3 (pid: 91, threadinfo=(ptrval), task=(ptrval), tls=00000000) [ 2.100812] Stack : 00000080 80255cfc 81c0700d 81f37a40 834c7000 00000000 00000002 834c7558 [ 2.109438] 00000002 804bbf3c 00000000 83501f78 834bb0b0 834df478 8194eae0 834c7000 [ 2.118058] 00000000 804bc020 ffffffed 83508780 00000000 00000004 834bb0b0 81f5b800 [ 2.126677] 808eb104 808eb104 81950000 804c48cc 00000003 81f5b800 81f5b800 00000000 [ 2.135297] 808eb104 81f5b800 808eb104 804bc6c0 834c7570 10008b01 81f5b800 81f5b8e0 [ 2.143925] ... [ 2.146435] Call Trace: [ 2.148943] [<804be000>] mdio_mux_read+0x2c/0xd4 [ 2.153697] [<804bbf3c>] __mdiobus_read+0x20/0xc4 [ 2.158533] [<804bc020>] mdiobus_read+0x40/0x6c [ 2.163193] [<804c48cc>] b53_mdio_probe+0x38/0x16c [ 2.168120] [<804bc6c0>] mdio_probe+0x34/0x7c [ 2.172600] [<80437930>] really_probe.part.0+0xac/0x35c [ 2.177976] [<80437c8c>] __driver_probe_device+0xac/0x164 [ 2.183531] [<80437d90>] driver_probe_device+0x4c/0x158 [ 2.188907] [<80438444>] __device_attach_driver+0xd0/0x15c [ 2.194552] [<804353a0>] bus_for_each_drv+0x70/0xb0 [ 2.199569] [<804380f0>] __device_attach+0xc0/0x1d8 [ 2.204588] [<804367f4>] bus_probe_device+0x9c/0xb8 [ 2.209604] [<80436d58>] deferred_probe_work_func+0x94/0xd4 [ 2.215339] [<80058314>] process_one_work+0x290/0x4d0 [ 2.220536] [<800588ac>] worker_thread+0x358/0x614 [ 2.225464] [<80061064>] kthread+0x148/0x16c [ 2.229854] [<80013848>] ret_from_kernel_thread+0x14/0x1c [ 2.235413] [ 2.236931] Code: 00a0a025 8e700004 00c09025 <8e040000> 0c1ba5d8 24840558 8e020010 8e06000c 8e65000c [ 2.247011] [ 2.248726] ---[ end trace 9e5942a13795eb30 ]--- [ 2.253490] Kernel panic - not syncing: Fatal exception [ 2.258831] Rebooting in 1 seconds.. 2. before dsa_register_switch(): [ 1.759901] bcm53xx 0.1:1e: failed to register switch: -19 [ 1.765837] b53-switch 10e00000.switch: MDIO mux bus init [ 1.771412] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 [ 1.782683] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found [ 1.793149] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12 [ 1.875791] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog [ 1.893480] NET: Registered PF_INET6 protocol family [ 1.922283] Segment Routing with IPv6 [ 1.926192] In-situ OAM (IOAM) with IPv6 [ 1.930392] NET: Registered PF_PACKET protocol family [ 1.936526] 8021q: 802.1Q VLAN Support v1.8 [ 2.245288] bcm53xx 1.1:1e: failed to register switch: -19 [ 2.251210] b53-switch 10e00000.switch: MDIO mux bus init [ 2.256761] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 *** Device hangs *** 3. before b53_switch_init(): [ 1.757728] bcm53xx 0.1:1e: failed to register switch: -19 [ 1.763689] b53-switch 10e00000.switch: MDIO mux bus init [ 1.769780] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 [ 1.781130] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found [ 1.790996] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12 [ 1.875775] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog [ 1.893523] NET: Registered PF_INET6 protocol family [ 1.921605] Segment Routing with IPv6 [ 1.925513] In-situ OAM (IOAM) with IPv6 [ 1.929695] NET: Registered PF_PACKET protocol family [ 1.935809] 8021q: 802.1Q VLAN Support v1.8 [ 2.244702] bcm53xx 1.1:1e: failed to register switch: -19 [ 2.250653] b53-switch 10e00000.switch: MDIO mux bus init [ 2.256751] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 *** Device hangs *** I will be happy to do any more tests if needed. Best regards, Álvaro.