On 16/05/2022 20:22, Nikolay Aleksandrov wrote: > On 16/05/2022 17:06, Jonathan Toppins wrote: >> On 5/15/22 02:32, Nikolay Aleksandrov wrote: >>> On 15/05/2022 00:41, Nikolay Aleksandrov wrote: >>>> On 13/05/2022 20:43, Jonathan Toppins wrote: >>>>> Implement a MAC filter that prevents duplicate frame delivery when >>>>> handling BUM traffic. This attempts to partially replicate OvS SLB >>>>> Bonding[1] like functionality without requiring significant change >>>>> in the Linux bridging code. >>>>> >>>>> A typical network setup for this feature would be: >>>>> >>>>> .--------------------------------------------. >>>>> | .--------------------. | >>>>> | | | | >>>>> .-------------------. | | >>>>> | | Bond 0 | | | | >>>>> | .--'---. .---'--. | | | >>>>> .----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------. >>>>> | | '------' '------' | | | Switch 1 | | Switch 2 | >>>>> | '---,---------------' | | +---+ | >>>>> | / | '----+-----' '----+------' >>>>> | .---'---. .------. | | | >>>>> | | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~ >>>>> | '-------' '------' | ( ) >>>>> | | .------. | ( Rest of Network ) >>>>> | '--------| VM # | | (_____________________) >>>>> | '------' | >>>>> | Host 1 | >>>>> '-----------------------------' >>>>> >>>>> Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with >>>>> bond0 and its associated links, eth0 & eth1, provide ingress/egress. One >>>>> can assume bond0, br1, and hosts VM1 to VM# are all contained in a >>>>> single box, as depicted. Interfaces eth0 and eth1 provide redundant >>>>> connections to the data center with the requirement to use all bandwidth >>>>> when the system is functioning normally. Switch 1 and Switch 2 are >>>>> physical switches that do not implement any advanced L2 management >>>>> features such as MLAG, Cisco's VPC, or LACP. >>>>> >>>>> Combining this feature with vlan+srcmac hash policy allows a user to >>>>> create an access network without the need to use expensive switches that >>>>> support features like Cisco's VCP. >>>>> >>>>> [1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding >>>>> >>>>> Co-developed-by: Long Xin <lxin@xxxxxxxxxx> >>>>> Signed-off-by: Long Xin <lxin@xxxxxxxxxx> >>>>> Signed-off-by: Jonathan Toppins <jtoppins@xxxxxxxxxx> >>>>> --- >>>>> >>>>> Notes: >>>>> v2: >>>>> * dropped needless abstraction functions and put code in module init >>>>> * renamed variable "rc" to "ret" to stay consistent with most of the >>>>> code >>>>> * fixed parameter setting management, when arp-monitor is turned on >>>>> this feature will be turned off similar to how miimon and arp-monitor >>>>> interact >>>>> * renamed bond_xor_recv to bond_mac_filter_recv for a little more >>>>> clarity >>>>> * it appears the implied default return code for any bonding recv probe >>>>> must be `RX_HANDLER_ANOTHER`. Changed the default return code of >>>>> bond_mac_filter_recv to use this return value to not break skb >>>>> processing when the skb dev is switched to the bond dev: >>>>> `skb->dev = bond->dev` >>>>> v3: Nik's comments >>>>> * clarified documentation >>>>> * fixed inline and basic reverse Christmas tree formatting >>>>> * zero'ed entry in mac_create >>>>> * removed read_lock taking in bond_mac_filter_recv >>>>> * made has_expired() atomic and removed critical sections >>>>> surrounding calls to has_expired(), this also removed the >>>>> use-after-free that would have occurred: >>>>> spin_lock_irqsave(&entry->lock, flags); >>>>> if (has_expired(bond, entry)) >>>>> mac_delete(bond, entry); >>>>> spin_unlock_irqrestore(&entry->lock, flags); <--- >>>>> * moved init/destroy of mac_filter_tbl to bond_open/bond_close >>>>> this removed the complex option dependencies, the only behavioural >>>>> change the user will see is if the bond is up and mac_filter is >>>>> enabled if they try and set arp_interval they will receive -EBUSY >>>>> * in bond_changelink moved processing of mac_filter option just below >>>>> mode processing >>>>> >>>>> Documentation/networking/bonding.rst | 20 +++ >>>>> drivers/net/bonding/Makefile | 2 +- >>>>> drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++ >>>>> drivers/net/bonding/bond_mac_filter.h | 37 +++++ >>>>> drivers/net/bonding/bond_main.c | 30 ++++ >>>>> drivers/net/bonding/bond_netlink.c | 13 ++ >>>>> drivers/net/bonding/bond_options.c | 81 +++++++++-- >>>>> drivers/net/bonding/bonding_priv.h | 1 + >>>>> include/net/bond_options.h | 1 + >>>>> include/net/bonding.h | 3 + >>>>> include/uapi/linux/if_link.h | 1 + >>>>> 11 files changed, 373 insertions(+), 17 deletions(-) >>>>> create mode 100644 drivers/net/bonding/bond_mac_filter.c >>>>> create mode 100644 drivers/net/bonding/bond_mac_filter.h >>>>> >>>> >>> [snip] >>> >>> The same problem solved using a few nftables rules (in case you don't want to load eBPF): >>> $ nft 'add table netdev nt' >>> $ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' >>> $ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }' >>> $ nft 'add set netdev nt macset { type ether_addr; flags timeout; }' >>> $ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset' >>> $ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop' >>> >> >> I get the following when trying to apply this on a fedora 35 install. >> >> root@fedora ~]# ip link add bond0 type bond mode balance-xor xmit_hash_policy vlan+srcmac >> [root@fedora ~]# nft 'add table netdev nt' >> [root@fedora ~]# nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }' >> Error: unknown chain hook >> add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; } >> ^^^^^^ >> [root@fedora ~]# uname -a >> Linux fedora 5.17.5-200.fc35.x86_64 #1 SMP PREEMPT Thu Apr 28 15:41:41 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux >> > > Well, take it up with the Fedora nftables package maintainer. :) > > Your nftables version is old (I'd guess <1.0.1): > commit 510c4fad7e78 > Author: Lukas Wunner <lukas@xxxxxxxxx> > Date: Wed Mar 11 13:20:06 2020 +0100 > > src: Support netdev egress hook > > $ git tag --contains 510c4fad7e78f > v1.0.1 > v1.0.2 > > I just tested it[1] on Linux 5.16.18-200.fc35.x86_64 #1 SMP PREEMPT Mon Mar 28 14:10:07 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux > > Cheers, > Nik > > [1] You can clearly see the dynamically learned mac on egress (52:54:00:23:5f:13) and traffic > with that source is now blocked on ingress. > > $ nft -a list table netdev nt > set macset { # handle 10 > type ether_addr > size 65535 > flags timeout > elements = { 52:54:00:23:5f:13 timeout 5s expires 4s192ms } > } > > chain bond0EgressFilter { # handle 8 > type filter hook egress device "bond0" priority filter; policy accept; > update @macset { ether saddr timeout 5s } # handle 11 > } > > chain bond0IngressFilter { # handle 9 > type filter hook ingress device "bond0" priority filter; policy accept; > } > Oops, pasted an old list before adding the ingress rule: chain bond0IngressFilter { # handle 9 type filter hook ingress device "bond0" priority filter; policy accept; ether saddr @macset counter packets 120 bytes 7680 drop # handle 15 }