On 01/08/2019 01:37, Nikolay Aleksandrov wrote: > Most of the bridge device's vlan init bugs come from the fact that it's > done in the wrong place, way too early in ndo_init() before the device is > even assigned an ifindex. That makes error handling harder, especially for > older kernels which don't have bridge ndo_uninit callback. It also > introduces another bug when the bridge's dev_addr is added as fdb in the > the initial default pvid on vlan initialization, the fdb notification has > ifindex/NDA_MASTER both equal to 0 (see example below) which really > makes no sense for user-space[0]. Usually user-space software would ignore > such entries, but they are actually valid and will eventually have all > necessary attributes. I chose to change the order because this can be > backported to all kernels even pre-ndo_uninit ones without many changes > and it keeps init/deinit symmetric. As a bonus this allows us to keep > the vlan init/deinit entirely in br_vlan.c and remove those exports. > It makes much more sense to send a notification *after* the device has > registered and has a proper ifindex allocated rather than before when > there's a chance that the registration might still fail. > > For the demonstration below a small change to iproute2 for printing all fdb > notifications is added, because it contained a workaround not to show > entries with ifindex == 0. > Command executed while monitoring: $ ip l add br0 type bridge > Before (both ifindex and master == 0): > $ bridge monitor fdb > 36:7e:8a:b3:56:ba dev * vlan 1 master * permanent > > After (proper br0 ifindex): > $ bridge monitor fdb > e6:2a:ae:7a:b7:48 dev br0 vlan 1 master br0 permanent > > v2: on error in br_vlan_init set br->vlgrp to NULL > > [0] https://bugzilla.kernel.org/show_bug.cgi?id=204389 > > Reported-by: michael-dev <michael-dev@xxxxxxxxxxxxx> > Fixes: 5be5a2df40f0 ("bridge: Add filtering support for default_pvid") > Signed-off-by: Nikolay Aleksandrov <nikolay@xxxxxxxxxxxxxxxxxxx> > --- > I tried a few different approaches to resolve this but they were all > unsuitable for some kernels, this approach can go to stables easily > and IMO is the way this had to be done from the start. Alternatively > we could move only the br_vlan_add and pair it with a br_vlan_del of > default_pvid on the same events, but I don't think it hurts to move > the whole init/deinit there as it'd help older stable releases as well. > > I also tested the br_vlan_init error handling after the move by always > returning errors from all over it. Since errors at NETDEV_REGISTER cause > NETDEV_UNREGISTER we can deinit vlans properly for all cases regardless > why it happened (e.g. device destruction or init error). > > net/bridge/br.c | 5 ++++- > net/bridge/br_device.c | 10 ---------- > net/bridge/br_private.h | 19 ++++--------------- > net/bridge/br_vlan.c | 25 ++++++++++++++++++------- > 4 files changed, 26 insertions(+), 33 deletions(-) > Aargh, I apologize for the noise, this is the wrong v2 patch... Will send the correct one as v3 in a moment.