On 5/15/22 02:32, Nikolay Aleksandrov wrote:
On 15/05/2022 00:41, Nikolay Aleksandrov wrote:
On 13/05/2022 20:43, Jonathan Toppins wrote:
Implement a MAC filter that prevents duplicate frame delivery when
handling BUM traffic. This attempts to partially replicate OvS SLB
Bonding[1] like functionality without requiring significant change
in the Linux bridging code.
A typical network setup for this feature would be:
.--------------------------------------------.
| .--------------------. |
| | | |
.-------------------. | |
| | Bond 0 | | | |
| .--'---. .---'--. | | |
.----|-| eth0 |-| eth1 |-|----. .-----+----. .----+------.
| | '------' '------' | | | Switch 1 | | Switch 2 |
| '---,---------------' | | +---+ |
| / | '----+-----' '----+------'
| .---'---. .------. | | |
| | br0 |----| VM 1 | | ~~~~~~~~~~~~~~~~~~~~~
| '-------' '------' | ( )
| | .------. | ( Rest of Network )
| '--------| VM # | | (_____________________)
| '------' |
| Host 1 |
'-----------------------------'
Where 'VM1' and 'VM#' are hosts connected to a Linux bridge, br0, with
bond0 and its associated links, eth0 & eth1, provide ingress/egress. One
can assume bond0, br1, and hosts VM1 to VM# are all contained in a
single box, as depicted. Interfaces eth0 and eth1 provide redundant
connections to the data center with the requirement to use all bandwidth
when the system is functioning normally. Switch 1 and Switch 2 are
physical switches that do not implement any advanced L2 management
features such as MLAG, Cisco's VPC, or LACP.
Combining this feature with vlan+srcmac hash policy allows a user to
create an access network without the need to use expensive switches that
support features like Cisco's VCP.
[1] https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding
Co-developed-by: Long Xin <lxin@xxxxxxxxxx>
Signed-off-by: Long Xin <lxin@xxxxxxxxxx>
Signed-off-by: Jonathan Toppins <jtoppins@xxxxxxxxxx>
---
Notes:
v2:
* dropped needless abstraction functions and put code in module init
* renamed variable "rc" to "ret" to stay consistent with most of the
code
* fixed parameter setting management, when arp-monitor is turned on
this feature will be turned off similar to how miimon and arp-monitor
interact
* renamed bond_xor_recv to bond_mac_filter_recv for a little more
clarity
* it appears the implied default return code for any bonding recv probe
must be `RX_HANDLER_ANOTHER`. Changed the default return code of
bond_mac_filter_recv to use this return value to not break skb
processing when the skb dev is switched to the bond dev:
`skb->dev = bond->dev`
v3: Nik's comments
* clarified documentation
* fixed inline and basic reverse Christmas tree formatting
* zero'ed entry in mac_create
* removed read_lock taking in bond_mac_filter_recv
* made has_expired() atomic and removed critical sections
surrounding calls to has_expired(), this also removed the
use-after-free that would have occurred:
spin_lock_irqsave(&entry->lock, flags);
if (has_expired(bond, entry))
mac_delete(bond, entry);
spin_unlock_irqrestore(&entry->lock, flags); <---
* moved init/destroy of mac_filter_tbl to bond_open/bond_close
this removed the complex option dependencies, the only behavioural
change the user will see is if the bond is up and mac_filter is
enabled if they try and set arp_interval they will receive -EBUSY
* in bond_changelink moved processing of mac_filter option just below
mode processing
Documentation/networking/bonding.rst | 20 +++
drivers/net/bonding/Makefile | 2 +-
drivers/net/bonding/bond_mac_filter.c | 201 ++++++++++++++++++++++++++
drivers/net/bonding/bond_mac_filter.h | 37 +++++
drivers/net/bonding/bond_main.c | 30 ++++
drivers/net/bonding/bond_netlink.c | 13 ++
drivers/net/bonding/bond_options.c | 81 +++++++++--
drivers/net/bonding/bonding_priv.h | 1 +
include/net/bond_options.h | 1 +
include/net/bonding.h | 3 +
include/uapi/linux/if_link.h | 1 +
11 files changed, 373 insertions(+), 17 deletions(-)
create mode 100644 drivers/net/bonding/bond_mac_filter.c
create mode 100644 drivers/net/bonding/bond_mac_filter.h
[snip]
The same problem solved using a few nftables rules (in case you don't want to load eBPF):
$ nft 'add table netdev nt'
$ nft 'add chain netdev nt bond0EgressFilter { type filter hook egress device bond0 priority 0; }'
$ nft 'add chain netdev nt bond0IngressFilter { type filter hook ingress device bond0 priority 0; }'
$ nft 'add set netdev nt macset { type ether_addr; flags timeout; }'
$ nft 'add rule netdev nt bond0EgressFilter set update ether saddr timeout 5s @macset'
$ nft 'add rule netdev nt bond0IngressFilter ether saddr @macset counter drop'
I get the following when trying to apply this on a fedora 35 install.
root@fedora ~]# ip link add bond0 type bond mode balance-xor
xmit_hash_policy vlan+srcmac
[root@fedora ~]# nft 'add table netdev nt'
[root@fedora ~]# nft 'add chain netdev nt bond0EgressFilter { type
filter hook egress device bond0 priority 0; }'
Error: unknown chain hook
add chain netdev nt bond0EgressFilter { type filter hook egress device
bond0 priority 0; }
^^^^^^
[root@fedora ~]# uname -a
Linux fedora 5.17.5-200.fc35.x86_64 #1 SMP PREEMPT Thu Apr 28 15:41:41
UTC 2022 x86_64 x86_64 x86_64 GNU/Linux