El lun, 20 mar 2023 a las 11:27, Jonas Gorski (<jonas.gorski@xxxxxxxxx>) escribió: > > On Sun, 19 Mar 2023 at 10:45, Álvaro Fernández Rojas <noltari@xxxxxxxxx> wrote: > > > > El vie, 17 mar 2023 a las 17:41, Florian Fainelli > > (<f.fainelli@xxxxxxxxx>) escribió: > > > > > > On 3/17/23 09:23, Álvaro Fernández Rojas wrote: > > > > El vie, 17 mar 2023 a las 15:29, Vladimir Oltean (<olteanv@xxxxxxxxx>) escribió: > > > >> > > > >> On Fri, Mar 17, 2023 at 03:17:12PM +0100, Álvaro Fernández Rojas wrote: > > > >>>> The proposed solution is too radical for a problem that was not properly > > > >>>> characterized yet, so this patch set has my temporary NACK. > > > >>> > > > >>> Forgive me, but why do you consider this solution too radical? > > > >> > > > >> Because it involves changing device tree bindings (stable ABI) in an > > > >> incompatible way. > > > >> > > > >>>> > > > >>>>> But maybe Florian or Jonas can give some more details about the issue... > > > >>>> > > > >>>> I think you also have the tools necessary to investigate this further. > > > >>>> We need to know what resource belonging to the switch is it that the > > > >>>> MDIO mux needs. Where is the earliest place you can add the call to > > > >>>> b53_mmap_mdiomux_init() such that your board works reliably? Note that > > > >>>> b53_switch_register() indirectly calls b53_setup(). By placing this > > > >>>> function where you have, the entirety of b53_setup() has finished > > > >>>> execution, and we don't know exactly what is it from there that is > > > >>>> needed. > > > >>> > > > >>> In the following link you will find different bootlogs related to > > > >>> different scenarios all of them with the same result: any attempt of > > > >>> calling b53_mmap_mdiomux_init() earlier than b53_switch_register() > > > >>> will either result in a kernel panic or a device hang: > > > >>> https://gist.github.com/Noltari/b0bd6d5211160ac7bf349d998d21e7f7 > > > >>> > > > >>> 1. before b53_switch_register(): > > > >>> > > > >>> 2. before dsa_register_switch(): > > > >>> > > > >>> 3. before b53_switch_init(): > > > >> > > > >> Did you read what I said? > > > > > > > > Yes, but I didn't get your point, sorry for that. > > > > > > > >> > > > >> | Note that b53_switch_register() indirectly calls b53_setup(). By placing > > > >> | this function where you have, the entirety of b53_setup() has finished > > > >> | execution, and we don't know exactly what is it from there that is > > > >> | needed. > > > >> > > > >> Can you place the b53_mmap_mdiomux_init() in various places within > > > >> b53_setup() to restrict the search further? > > > > > > > > I tried and these are the results: > > > > https://gist.github.com/Noltari/d5bdba66b8f2e392c9e4c2759661d862 > > > > > > > > All of them hang when dsa_tree_setup() is called for DSA tree 1 > > > > (external switch) without having completely setup DSA tree 0 (internal > > > > switch): > > > > [ 1.471345] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 > > > > [ 1.481099] bcm6368-enetsw 1000d800.ethernet: IRQ tx not found > > > > [ 1.506752] bcm6368-enetsw 1000d800.ethernet: mtd mac 4c:60:de:86:52:12 > > > > [ 1.594365] bcm7038-wdt 1000005c.watchdog: Registered BCM7038 Watchdog > > > > [ 1.612008] NET: Registered PF_INET6 protocol family > > > > [ 1.645617] Segment Routing with IPv6 > > > > [ 1.649547] In-situ OAM (IOAM) with IPv6 > > > > [ 1.653948] NET: Registered PF_PACKET protocol family > > > > [ 1.659984] 8021q: 802.1Q VLAN Support v1.8 > > > > [ 1.699193] b53-switch 10e00000.switch: found switch: BCM63xx, rev 0 > > > > [ 2.124257] bcm53xx 0.1:1e: found switch: BCM53125, rev 4 > > > > *** Device hang *** > > > > > > > > I don't know if there's a way to defer the probe of DSA tree 1 (the > > > > external switch) until DSA tree 0 (the internal switch) is completely > > > > setup, because that would probably be the only alternative way of > > > > fixing this. > > > > > > Could you find out which part is hanging? It looks like there is a busy > > > waiting operation that we never complete? > > > > After many tests I was able to find the part that was hanging the device. > > It turns out that if the MDIO bus controller is registered soon > > enough, b53_phy_read16 will be called for the RGMII port on the > > internal switch: > > [ 4.042698] b53-switch 10e00000.switch: b53_phy_read16: ds=81fede80 > > phy_read16=00000000 addr=4 reg=2 > > It turns out that the device is hanging on the following line of > > b53_phy_read16(): > > b53_read16(priv, B53_PORT_MII_PAGE(addr), reg * 2, &value); > > Maybe it's not safe to access B53_PORT_MII_PAGE() on MMAP switches? > > If you are following the example from 1/3, then I think I see what the > issue might be here: > > You are labeling the port where the external switch is connected as > "extsw", which is neither "cpu" nor "dsa", so it is treated as a > normal/user port (which it shouldn't). If you change its label to > "dsa" (which AFAIU would be the correct one to denote a daisy chained > switch) it should not try to access port 4's MDIO registers (via the > slave mdio bus). Correct me if I'm wrong, but I think that the configuration you're suggesting would be for a different kind of switches layout. In this case I'm using a disjoint tree setup since both switches are using different tags incompatible with each other. So the proper switch layout should be the following (for a Huawei HG253s, which is a BCM6362 with a WAN port on the internal switch port 5 and the external switch BCM53124S connected to the internal switch on port 4): https://gist.github.com/Noltari/975374f908bb056ecf2544d289255b2e#file-b53-disjoint-switch-trees-hg253s-v2-dts BTW, this is the log (as you can see b53_mmap_phy_read16() is used for ports 4 & 5 at the beginning. After the switch initialization, the mdio-mux bus controller is used): https://gist.github.com/Noltari/975374f908bb056ecf2544d289255b2e#file-b53-disjoint-switch-trees-hg253s-v2-log I think that we need this kind of layout because as we already discussed on "net: dsa: tag_brcm: legacy: fix daisy-chained switches" (https://patchwork.kernel.org/project/netdevbpf/patch/20230317120815.321871-1-noltari@xxxxxxxxx/) it's the only way of removing the incorrect VLAN tag added by the internal BCM63xx switch when it receives a packet from the external switch on port 4 (RGMII). Otherwise, the double tag won't be processed correctly due to the invalid VLAN tag and the packet won't be correctly forwarded from the external switch to the internal switch. But I may be wrong here since I don't have much experience with DSA... That's why I decided to upstream all the patches that I sent lately, to seek for help of the experts :) > > Can you check if that helps? > > Regards > Jonas Best regards, Álvaro.