On 16/02/2022 17:56, Tobias Waldekranz wrote: > On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <nikolay@xxxxxxxxxx> wrote: >> On 16/02/2022 15:29, Tobias Waldekranz wrote: >>> The bridge has had per-VLAN STP support for a while now, since: >>> >>> https://lore.kernel.org/netdev/20200124114022.10883-1-nikolay@xxxxxxxxxxxxxxxxxxx/ >>> >>> The current implementation has some problems: >>> >>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN >>> is managed independently. This is awkward from an MSTP (802.1Q-2018, >>> Clause 13.5) point of view, where the model is that multiple VLANs >>> are grouped into MST instances. >>> >>> Because of the way that the standard is written, presumably, this is >>> also reflected in hardware implementations. It is not uncommon for a >>> switch to support the full 4k range of VIDs, but that the pool of >>> MST instances is much smaller. Some examples: >>> >>> Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs >>> Marvell Prestera: 4k VLANs, but only 128 MSTIs >>> Microchip SparX-5i: 4k VLANs, but only 128 MSTIs >>> >>> - By default, the feature is enabled, and there is no way to disable >>> it. This makes it hard to add offloading in a backwards compatible >>> way, since any underlying switchdevs have no way to refuse the >>> function if the hardware does not support it >>> >>> - The port-global STP state has precedence over per-VLAN states. In >>> MSTP, as far as I understand it, all VLANs will use the common >>> spanning tree (CST) by default - through traffic engineering you can >>> then optimize your network to group subsets of VLANs to use >>> different trees (MSTI). To my understanding, the way this is >>> typically managed in silicon is roughly: >>> >>> Incoming packet: >>> .----.----.--------------.----.------------- >>> | DA | SA | 802.1Q VID=X | ET | Payload ... >>> '----'----'--------------'----'------------- >>> | >>> '->|\ .----------------------------. >>> | +--> | VID | Members | ... | MSTI | >>> PVID -->|/ |-----|---------|-----|------| >>> | 1 | 0001001 | ... | 0 | >>> | 2 | 0001010 | ... | 10 | >>> | 3 | 0001100 | ... | 10 | >>> '----------------------------' >>> | >>> .-----------------------------' >>> | .------------------------. >>> '->| MSTI | Fwding | Lrning | >>> |------|--------|--------| >>> | 0 | 111110 | 111110 | >>> | 10 | 110111 | 110111 | >>> '------------------------' >>> >>> What this is trying to show is that the STP state (whether MSTP is >>> used, or ye olde STP) is always accessed via the VLAN table. If STP >>> is running, all MSTI pointers in that table will reference the same >>> index in the STP stable - if MSTP is running, some VLANs may point >>> to other trees (like in this example). >>> >>> The fact that in the Linux bridge, the global state (think: index 0 >>> in most hardware implementations) is supposed to override the >>> per-VLAN state, is very awkward to offload. In effect, this means >>> that when the global state changes to blocking, drivers will have to >>> iterate over all MSTIs in use, and alter them all to match. This >>> also means that you have to cache whether the hardware state is >>> currently tracking the global state or the per-VLAN state. In the >>> first case, you also have to cache the per-VLAN state so that you >>> can restore it if the global state transitions back to forwarding. >>> >>> This series adds support for an arbitrary M:N mapping of VIDs to >>> MSTIs, proposing one solution to the first issue. An example of an >>> offload implementation for mv88e6xxx is also provided. Offloading is >>> done on a best-effort basis, i.e. notifications of the relevant events >>> are generated, but there is no way for the user to see whether the >>> per-VLAN state has been offloaded or not. There is also no handling of >>> the relationship between the port-global state the the per-VLAN ditto. >>> >>> If I was king of net/bridge/*, I would make the following additional >>> changes: >>> >>> - By default, when a VLAN is created, assign it to MSTID 0, which >>> would mean that no per-VLAN state is used and that packets belonging >>> to this VLAN should be filtered according to the port-global state. >>> >>> This way, when a VLAN is configured to use a separate tree (setting >>> a non-zero MSTID), an underlying switchdev could oppose it if it is >>> not supported. >>> >>> Obviously, this adds an extra step for existing users of per-VLAN >>> STP states and would thus not be backwards compatible. Maybe this >>> means that that is impossible to do, maybe not. >>> >>> - Swap the precedence of the port-global and the per-VLAN state, >>> i.e. the port-global state only applies to packets belonging to >>> VLANs that does not make use of a per-VLAN state (MSTID != 0). >>> >>> This would make the offloading much more natural, as you avoid all >>> of the caching stuff described above. >>> >>> Again, this changes the behavior of the kernel so it is not >>> backwards compatible. I suspect that this is less of an issue >>> though, since my guess is that very few people rely on the old >>> behavior. >>> >>> Thoughts? >>> >> >> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour >> as described help? It can require that there are no vlans present to change >> similar to the per-port vlan stats option. > > Great idea, I did not know that that's how vlan stats worked. I will > definitely look into it, thanks! > >> Also based on that option you can alter >> how the state checks are performed. For example, you can skip the initial port state >> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering >> is disabled and mst enabled and you can avoid checking it altogether if filter && mst >> are enabled then always use the vlan mst state. Similar changes would have to happen >> for the egress path. Since we are talking about multiple tests the new MST logic can >> be hidden behind a static key for both br_handle_frame() and later stages. > > Makes sense. > > So should we keep the current per-VLAN state as-is then? And bolt the > MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8 > state` for the current implementation _and_ a `struct br_vlan_mst *` > that is populated for VLANs tied to a non-zero MSTI? > Good question. The u8 we should keep for the quick access/cache of state, the ptr we might escape by keeping just the mst id and fetching it, i.e. it could be just 2 bytes instead of 8, but that's not really a problem if it will be used just for bookkeeping in slow paths. It can always be pushed in the end of the struct and if a ptr makes things simpler and easier it's ok. So in short yes, I think we can keep both. >> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path, >> that is definitely undesirable. Please either cache that state in the vlan and update it when >> something changes, or think of some way which avoids that cache line in fast-path. >> Alternative would be to make that cache line dependent on the new option, so it's needed >> only when mst feature is enabled. > > If we go with the approach I suggested above, then the current `u8 > state` on `struct net_bridge_vlan` could be that cache, right? > Yep, that's the idea. > With the current implementation, it is set directly - in the new MST > mode all grouped VLANs would have their states updated when updating the > MSTI's state. Right Cheers, Nik