On 15/05/2022 00:41, Nikolay Aleksandrov wrote: > On 13/05/2022 20:43, Jonathan Toppins wrote: >> Implement a MAC filter that prevents duplicate frame delivery when >> handling BUM traffic. This attempts to partially replicate OvS SLB >> Bonding[1] like functionality without requiring significant change >> in the Linux bridging code. >> >> A typical network setup for this feature would be: >> >> .--------------------------------------------. >> | .--------------------. | >> | | | | >> .-------------------. | | >> | | Bond 0 | | | | >> | .--'---. .---'--. | | | >> .----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------. >> | | '------' '------' | | | Switch 1 | | Switch 2 | >> | '---,---------------' | | +---+ | >> | / | '----+-----' '----+------' >> | .---'---. .------. | | | >> | | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~ >> | '-------' '------' | ( ) >> | | .------. | ( Rest of Network ) >> | '--------| VM # | | (_____________________) >> | '------' | >> | Host 1 | >> '-----------------------------' >> >> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with >> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One >> can assume bond0, br1, and hosts VM1 to VM# are all contained in a >> single box, as depicted. Interfaces eth0 and eth1 provide redundant >> connections to the data center with the requirement to use all bandwidth >> when the system is functioning normally. Switch 1 and Switch 2 are >> physical switches that do not implement any advanced L2 management >> features such as MLAG, Cisco's VPC, or LACP. >> >> Combining this feature with vlan+srcmac hash policy allows a user to >> create an access network without the need to use expensive switches that >> support features like Cisco's VCP. >> >> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding >> >> Co-developed-by: Long Xin <lxin@xxxxxxxxxx> >> Signed-off-by: Long Xin <lxin@xxxxxxxxxx> >> Signed-off-by: Jonathan Toppins <jtoppins@xxxxxxxxxx> >> --- >> >> Notes: >> v2: >> * dropped needless abstraction functions and put code in module init >> * renamed variable "rc" to "ret" to stay consistent with most of the >> code >> * fixed parameter setting management, when arp-monitor is turned on >> this feature will be turned off similar to how miimon and arp-monitor >> interact >> * renamed bond_xor_recv to bond_mac_filter_recv for a little more >> clarity >> * it appears the implied default return code for any bonding recv probe >> must be `RX_HANDLER_ANOTHER`. Changed the default return code of >> bond_mac_filter_recv to use this return value to not break skb >> processing when the skb dev is switched to the bond dev: >> `skb->dev = bond->dev` >> >> v3: Nik's comments >> * clarified documentation >> * fixed inline and basic reverse Christmas tree formatting >> * zero'ed entry in mac_create >> * removed read_lock taking in bond_mac_filter_recv >> * made has_expired() atomic and removed critical sections >> surrounding calls to has_expired(), this also removed the >> use-after-free that would have occurred: >> spin_lock_irqsave(&entry->lock, flags); >> if (has_expired(bond, entry)) >> mac_delete(bond, entry); >> spin_unlock_irqrestore(&entry->lock, flags); <--- >> * moved init/destroy of mac_filter_tbl to bond_open/bond_close >> this removed the complex option dependencies, the only behavioural >> change the user will see is if the bond is up and mac_filter is >> enabled if they try and set arp_interval they will receive -EBUSY >> * in bond_changelink moved processing of mac_filter option just below >> mode processing >> >> Documentation/networking/bonding.rst | 20 +++ >> drivers/net/bonding/Makefile | 2 +- >> drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++ >> drivers/net/bonding/bond_mac_filter.h | 37 +++++ >> drivers/net/bonding/bond_main.c | 30 ++++ >> drivers/net/bonding/bond_netlink.c | 13 ++ >> drivers/net/bonding/bond_options.c | 81 +++++++++-- >> drivers/net/bonding/bonding_priv.h | 1 + >> include/net/bond_options.h | 1 + >> include/net/bonding.h | 3 + >> include/uapi/linux/if_link.h | 1 + >> 11 files changed, 373 insertions(+), 17 deletions(-) >> create mode 100644 drivers/net/bonding/bond_mac_filter.c >> create mode 100644 drivers/net/bonding/bond_mac_filter.h >> > [snip] The same problem solved using a few nftables rules (in case you don't want to load eBPF): $ nft 'add table netdev nt' $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }' $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }' $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset' $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop' Cheers, Nik