Load Balancing WAN connections with nftables

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey All,

I am working on an example for WAN load balancing with nftables.  (Xtables =
iptables/nftables).
I am unsure how and if such a setup can work with flow offloading for
anything if at all.

Also, is there any recommendation for managing and\or handling fail-over
with nftables?
I am not sure if and how to make a "fall back\throw" between routes or
marks.
Can I test if a connection is up or down by either some flag or other means
in nftables?
I was thinking about changing only the vmap and would be happy to hear other
ideas.

For now the main points I have in mind are:
* Connections stickiness (packet/connection/flow) - Xtables CONNMARK/ct
mark/meta mark
* LB Ratio/weight - Xtables nth/other + ipset


The goal of the next lab is to "simulate" IPv4 WAN load balancing across WAN
x N connections which will not tolerate per-packet LB.
Currently in the IPv4 world it means that different ISPs will lease dynamic
ip addresses to the DSL or Cable or Mobile/sat link.
Any TCP\UDP application will not tolerate a "connection"(more then 2
bi-directional packets with some context) which switch the source ip
address. 
The above will leave the basic linux kernel route cache which is either flow
or packet based in many or some cases un-reliable "enough".

My testing environment is:
Client 1 - 192.168.125.100/24 gw 192.168.125.254
R1 - 192.168.125.254/24 , 192.168.126.100/24(Masquerade) with
192.168.111.0/24 via 192.168.126.202 or 192.168.126.202
R2 - 192.168.126.202/24, 192.168.111.92/24(Masquerade) 
R3 - 192.168.126.203/24 , 192.168.111.93/24(Masquerade)
WebServer( HTTP + HTTPS) - 192.168.111.96/24

All of the above are Alpine linux stable 3.12 with kernel 5.4.43-1-virt

With iptables it's simple enough to load balance across links with the nth
module.
The full script is at:
https://gist.github.com/elico/a993f07bb3cceade31ce08e35e97f3dd

The relevant iptables part is:
### Start
IPTABLES="/sbin/iptables"
LAN="eth0"
WAN="eth1"

$IPTABLES -t mangle -F PREROUTING

$IPTABLES -t mangle -F PCC_OUT
$IPTABLES -t mangle -N PCC_OUT

$IPTABLES -t mangle -F PCC_OUT_RET
$IPTABLES -t mangle -N PCC_OUT_RET

$IPTABLES -t mangle -F MARK_COUNTER
$IPTABLES -t mangle -N MARK_COUNTER


$IPTABLES -t mangle -A POSTROUTING -j CONNMARK --save-mark
$IPTABLES -t mangle -A POSTROUTING -o $LAN -m connmark ! --mark 0 -j
MARK_COUNTER
$IPTABLES -t mangle -A POSTROUTING -o $WAN -m connmark ! --mark 0 -j
MARK_COUNTER

$IPTABLES -t mangle -A PCC_OUT -m statistic --mode nth --every 4 --packet 0
-j CONNMARK --set-mark 1
$IPTABLES -t mangle -A PCC_OUT -m statistic --mode nth --every 4 --packet 1
-j CONNMARK --set-mark 2
$IPTABLES -t mangle -A PCC_OUT -m statistic --mode nth --every 4 --packet 2
-j CONNMARK --set-mark 1
$IPTABLES -t mangle -A PCC_OUT -m statistic --mode nth --every 4 --packet 3
-j CONNMARK --set-mark 2

$IPTABLES -t mangle -A PCC_OUT -m connmark --mark 0 -j PCC_OUT_RET


$IPTABLES -t mangle -A PREROUTING -m state --state ESTABLISHED,RELATED -j
CONNMARK --restore-mark
$IPTABLES -t mangle -A PREROUTING -m connmark ! --mark 0 -j CONNMARK
--save-mark

$IPTABLES -t mangle -A PREROUTING -i $LAN -m conntrack --ctstate NEW -j
PCC_OUT

$IPTABLES -t mangle -A PREROUTING -m connmark --mark 0x1 -j MARK --set-mark
1
$IPTABLES -t mangle -A PREROUTING -m connmark --mark 0x2 -j MARK --set-mark
2
### End

The result would be round robin load balancing per NEW connection.

The missing parts which nftables gives are:
consistent flow\tuple hash based load balancing and connection distribution

The basic docs lacked the example I was looking for:
https://wiki.nftables.org/wiki-nftables/index.php/Load_balancing

The script I wrote with nftables is at: 
https://gist.github.com/elico/492d8f75f584ec1bed98b2a054a02cbb

and the relevant nftalbes part is(assuming an empty nftables ruleset):
### Start
NFTABLES="/usr/sbin/nft"
LAN="eth0"
WAN="eth1"

#NAT
${NFTABLES} add table nat
${NFTABLES} add chain ip nat postrouting '{ type nat hook postrouting
priority 100; policy accept; }'
${NFTABLES} add rule nat postrouting oif ${WAN} masquerade

${NFTABLES} add table mangle

${NFTABLES} add chain ip mangle prerouting '{ type filter hook prerouting
priority -150; policy accept; }'
${NFTABLES} add chain ip mangle input '{ type filter hook input priority
-150; policy accept; }'
${NFTABLES} add chain ip mangle forward '{ type filter hook forward priority
-150; policy accept; }'
${NFTABLES} add chain ip mangle output '{ type route hook output priority
-150; policy accept; }'
${NFTABLES} add chain ip mangle postrouting '{ type filter hook postrouting
priority -150; policy accept; }'


${NFTABLES} add chain ip mangle wan1
${NFTABLES} add rule ip mangle wan1 counter ct mark set 0x1

${NFTABLES} add chain ip mangle wan2
${NFTABLES} add rule ip mangle wan2 counter ct mark set 0x2

# 5-tuple/flow/PCC(per connection classifier) LOAD Balance
${NFTABLES} add chain ip mangle PCC_OUT_TCP
${NFTABLES} add rule ip mangle PCC_OUT_TCP counter jhash ip saddr . tcp
sport . ip daddr . tcp dport mod 2 vmap { 0 : jump wan1, 1 : jump wan2 }

${NFTABLES} add chain ip mangle PCC_OUT_UDP
${NFTABLES} add rule ip mangle PCC_OUT_UDP counter jhash ip saddr . udp
sport . ip daddr . udp dport mod 2 vmap { 0 : jump wan1, 1 : jump wan2 }

${NFTABLES} add chain ip mangle PCC_OUT_OTHERS
${NFTABLES} add rule ip mangle PCC_OUT_OTHERS counter ip protocol { tcp, udp
}  return
${NFTABLES} add rule ip mangle PCC_OUT_OTHERS counter jhash ip saddr . ip
daddr mod 2 vmap { 0 : jump wan1, 1 : jump wan2 }


${NFTABLES} add rule ip mangle prerouting counter meta mark set ct mark
${NFTABLES} add rule ip mangle prerouting ct mark != 0x0 counter ct mark set
mark
${NFTABLES} add rule ip mangle prerouting iifname "${LAN}" ip protocol tcp
ct state new counter jump PCC_OUT_TCP
${NFTABLES} add rule ip mangle prerouting iifname "${LAN}" ip protocol udp
ct state new counter jump PCC_OUT_UDP
${NFTABLES} add rule ip mangle prerouting iifname "${LAN}" ct state new
counter jump PCC_OUT_OTHERS


${NFTABLES} add rule ip mangle prerouting ct mark 0x1 counter meta mark set
0x1
${NFTABLES} add rule ip mangle prerouting ct mark 0x2 counter meta mark set
0x2

${NFTABLES} add rule ip mangle postrouting counter ct mark set mark
### End

There is another concept sketch which can be used in big networks for
consistent NAT with a set:
### Start
#...
${NFTABLES} add table mangle

${NFTABLES} add set mangle myset { type ipv4_addr . inet_service . ipv4_addr
. inet_service \; flags dynamic ,timeout \; timeout 10m10s \; }

${NFTABLES} add chain ip mangle prerouting '{ type filter hook prerouting
priority -150; policy accept; }'

${NFTABLES} add rule mangle prerouting set add ip saddr . tcp sport . ip
daddr . tcp dport @myset
${NFTABLES} add rule mangle prerouting set add ip saddr . udp sport . ip
daddr . udp dport @myset
# ...
### End

The above will persist a connection mark selection per flow for 10 minutes
and when used correctly with ct set mark a connection will persist as long
as it lives.

Any thoughts or comments are more then welcome? 

Thanks,
Eliezer

----
Eliezer Croitoru
Tech Support
Mobile: +972-5-28704261
Email: ngtech1ltd@xxxxxxxxx




[Index of Archives]     [Linux Netfilter Development]     [Linux Kernel Networking Development]     [Netem]     [Berkeley Packet Filter]     [Linux Kernel Development]     [Advanced Routing & Traffice Control]     [Bugtraq]

  Powered by Linux