On ons, feb 14, 2024 at 18:45, Vladimir Oltean <olteanv@xxxxxxxxx> wrote: > On Mon, Feb 12, 2024 at 08:18:43PM +0100, Tobias Waldekranz wrote: >> Before this change, generation of the list of MDB events to replay >> would race against the creation of new group memberships, either from >> the IGMP/MLD snooping logic or from user configuration. >> >> While new memberships are immediately visible to walkers of >> br->mdb_list, the notification of their existence to switchdev event >> subscribers is deferred until a later point in time. So if a replay >> list was generated during a time that overlapped with such a window, >> it would also contain a replay of the not-yet-delivered event. >> >> The driver would thus receive two copies of what the bridge internally >> considered to be one single event. On destruction of the bridge, only >> a single membership deletion event was therefore sent. As a >> consequence of this, drivers which reference count memberships (at >> least DSA), would be left with orphan groups in their hardware >> database when the bridge was destroyed. >> >> This is only an issue when replaying additions. While deletion events >> may still be pending on the deferred queue, they will already have >> been removed from br->mdb_list, so no duplicates can be generated in >> that scenario. >> >> To a user this meant that old group memberships, from a bridge in >> which a port was previously attached, could be reanimated (in >> hardware) when the port joined a new bridge, without the new bridge's >> knowledge. >> >> For example, on an mv88e6xxx system, create a snooping bridge and >> immediately add a port to it: >> >> root@infix-06-0b-00:~$ ip link add dev br0 up type bridge mcast_snooping 1 && \ >> > ip link set dev x3 up master br0 >> >> And then destroy the bridge: >> >> root@infix-06-0b-00:~$ ip link del dev br0 >> root@infix-06-0b-00:~$ mvls atu >> ADDRESS FID STATE Q F 0 1 2 3 4 5 6 7 8 9 a >> DEV:0 Marvell 88E6393X >> 33:33:00:00:00:6a 1 static - - 0 . . . . . . . . . . >> 33:33:ff:87:e4:3f 1 static - - 0 . . . . . . . . . . >> ff:ff:ff:ff:ff:ff 1 static - - 0 1 2 3 4 5 6 7 8 9 a >> root@infix-06-0b-00:~$ >> >> The two IPv6 groups remain in the hardware database because the >> port (x3) is notified of the host's membership twice: once via the >> original event and once via a replay. Since only a single delete >> notification is sent, the count remains at 1 when the bridge is >> destroyed. >> >> Then add the same port (or another port belonging to the same hardware >> domain) to a new bridge, this time with snooping disabled: >> >> root@infix-06-0b-00:~$ ip link add dev br1 up type bridge mcast_snooping 0 && \ >> > ip link set dev x3 up master br1 >> >> All multicast, including the two IPv6 groups from br0, should now be >> flooded, according to the policy of br1. But instead the old >> memberships are still active in the hardware database, causing the >> switch to only forward traffic to those groups towards the CPU (port >> 0). >> >> Eliminate the race in two steps: >> >> 1. Grab the write-side lock of the MDB while generating the replay >> list. >> >> This prevents new memberships from showing up while we are generating >> the replay list. But it leaves the scenario in which a deferred event >> was already generated, but not delivered, before we grabbed the >> lock. Therefore: >> >> 2. Make sure that no deferred version of a replay event is already >> enqueued to the switchdev deferred queue, before adding it to the >> replay list, when replaying additions. >> >> Fixes: 4f2673b3a2b6 ("net: bridge: add helper to replay port and host-joined mdb entries") >> Signed-off-by: Tobias Waldekranz <tobias@xxxxxxxxxxxxxx> >> --- > > Excellent from my side, thank you! Thanks! > Reviewed-by: Vladimir Oltean <olteanv@xxxxxxxxx> > >> @@ -307,6 +336,50 @@ int switchdev_port_obj_del(struct net_device *dev, >> } >> EXPORT_SYMBOL_GPL(switchdev_port_obj_del); >> >> +/** >> + * switchdev_port_obj_act_is_deferred - Is object action pending? >> + * >> + * @dev: port device >> + * @nt: type of action; add or delete >> + * @obj: object to test >> + * >> + * Returns true if a deferred item is exists, which is equivalent >> + * to the action @nt of an object @obj. > > nitpick: replace "is exists" with something else like "is pending" or > "exists". > > Also "action of an object" or "on an object"? Yes, these are annoying. I might as well send a v5. pw-bot: changes-requested >> + * >> + * rtnl_lock must be held. >> + */