On 26/01/2021 15:56, Nikolay Aleksandrov wrote: > On 26/01/2021 15:25, Hangbin Liu wrote: >> On Tue, Jan 26, 2021 at 09:40:13AM +0200, Nikolay Aleksandrov wrote: >>> On 26/01/2021 06:09, Hangbin Liu wrote: >>>> After adding bridge as upper layer of bond/team, we usually clean up the >>>> IP address on bond/team and set it on bridge. When there is a failover, >>>> bond/team will not send gratuitous ARP since it has no IP address. >>>> Then the down layer(e.g. VM tap dev) of bridge will not able to receive >>>> this notification. >>>> >>>> Make bridge to be able to handle NETDEV_NOTIFY_PEERS notifier. >>>> >>>> Signed-off-by: Hangbin Liu <liuhangbin@xxxxxxxxx> >>>> --- >>>> net/bridge/br.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/net/bridge/br.c b/net/bridge/br.c >>>> index ef743f94254d..b6a0921bb498 100644 >>>> --- a/net/bridge/br.c >>>> +++ b/net/bridge/br.c >>>> @@ -125,6 +125,7 @@ static int br_device_event(struct notifier_block *unused, unsigned long event, v >>>> /* Forbid underlying device to change its type. */ >>>> return NOTIFY_BAD; >>>> >>>> + case NETDEV_NOTIFY_PEERS: >>>> case NETDEV_RESEND_IGMP: >>>> /* Propagate to master device */ >>>> call_netdevice_notifiers(event, br->dev); >>>> >>> >>> I'm not convinced this should be done by the bridge, setups usually have multiple ports >>> which may have link change events and these events are unrelated, i.e. we shouldn't generate >>> a gratuitous arp for all every time, there might be many different devices present. We have >>> setups with hundreds of ports which are mixed types of devices. >>> That seems inefficient, redundant and can potentially cause problems. >> >> Hi Nikolay, >> >> Thanks for the reply. There are a few reasons I think the bridge should >> handle NETDEV_NOTIFY_PEERS: >> >> 1. Only a few devices will call NETDEV_NOTIFY_PEERS notifier: bond, team, >> virtio, xen, 6lowpan. There should have no much notification message. > > You can't send a broadcast to all ports because 1 bond's link status has changed. > That makes no sense, the GARP needs to be sent only on that bond. The bond devices > are heavily used with bridge setups, and in general the bridge is considered a switch > device, it shouldn't be broadcasting GARPs to all ports when one changes link state. > Scratch the last sentence, I guess you're talking about when the bond's mac causes the bridge to change mac address by br_stp_recalculate_bridge_id(). I was wondering at first why would you need to send garp, but in fact, as Ido mentioned privately, it is already handled correctly, but you need to have set arp_notify sysctl. Then if the bridge's mac changes because of the bond flapping a NETDEV_NOTIFY_PEERS will be generated. Check: devinet.c inetdev_event() -> case NETDEV_CHANGEADDR Alternatively you can always set the bridge mac address manually and then it won't be changed by such events. >> 2. When set bond/team's upper layer to bridge. The bridge's mac will be the >> same with bond/team. So when the bond/team's mac changed, the bridge's mac >> will also change. So bridge should send a GARP to notify other's that it's >> mac has changed. > > That is not true, the mac doesn't need to be the same at all. And in many > situations isn't. > >> 3. There already has NETDEV_RESEND_IGMP handling in bridge, which is also >> generated by bond/team and netdev_notify_peers(). So why there is IGMP >> but no ARP? > > Apples and oranges.. > >> 4. If bridge doesn't have IP address, it will omit GARP sending. So having >> or not having IP address on bridge doesn't matters. >> 4. I don't see why how many ports affect the bridge sending GARP. > > Bridge broadcasts are notoriously slow, they consider every port. We've seen glean > traffic take up 100% CPU with only 10k pps. I have patches that fix the situation for > *some* cases (i.e. where not all ports need to be considered), but in general you can't > optimize it much, so it's best to avoid sending them altogether. > Just imagine having a hundred SVIs on top of the bridge, that would lead to number if SVIs > multipled by the number of ports broadcast packets for each link flap of some bond/team port. > Same thing happens if there are macvlans on top, we have setups with thousands of virtual devices > and this will just kill them, if it was at all correct behaviour then we might look for a solution > but it is not in general. GARPs must be confined only to the bond ports which changed state, and > not broadcast to all every time. Again scratch the last part, I misunderstood why you need garps at first. > >> >> Please correct me if I missed something. >> >>> Also it seems this was proposed few years back: https://lkml.org/lkml/2018/1/6/135 >> >> Thanks for this link, cc Stephen for this discuss. >> >> Hangbin >> > >