Here are code generated nftables rules for nat portion of k8s proxy. Probably it does not cover all cases, but on a normal k8s cluster it would be sufficient. Appreciate reviews and suggestions for optimization. Thank you very much. Serguei table ip ipv4table { chain nat-preroutin { type nat hook prerouting priority filter; policy accept; jump k8s-nat-services } chain nat-output { type nat hook output priority filter; policy accept; jump k8s-nat-services } chain nat-postrouting { type nat hook postrouting priority filter; policy accept; jump k8s-nat-postrouting } chain k8s-nat-mark-drop { meta mark set 0x00008000 } chain k8s-nat-services { ip saddr != 57.112.0.0/12 ip daddr 57.142.221.21 tcp dport 80 meta mark set 0x00004000 ip daddr 57.142.221.21 tcp dport 80 jump KUBE-SVC-57XVOCFNTLTR3Q27 ip saddr != 57.112.0.0/12 ip daddr 57.142.35.114 tcp dport 15443 meta mark set 0x00004000 ip daddr 57.142.35.114 tcp dport 15443 jump KUBE-SVC-S4S242M2WNFIAT6Y ip daddr 57.131.151.19 tcp dport 8989 jump KUBE-SVC-MUPXPVK4XAZHSWAR ip daddr 192.168.80.104 tcp dport 8989 meta mark set 0x00004000 fib saddr type != local ip daddr 192.168.80.104 tcp dport 8989 iifname != "bridge*" jump KUBE-SVC-MUPXPVK4XAZHSWAR fib daddr type local ip daddr 192.168.80.104 tcp dport 8989 jump KUBE-SVC-MUPXPVK4XAZHSWAR } chain k8s-nat-nodeports { tcp dport 30725 meta mark set 0x00004000 jump KUBE-SVC-S4S242M2WNFIAT6Y } chain k8s-nat-postrouting { meta mark 0x00004000 masquerade random,persistent } chain KUBE-SVC-S4S242M2WNFIAT6Y { jump KUBE-SEP-CUAZ6PSSTEDPJ43V } chain KUBE-SVC-57XVOCFNTLTR3Q27 { numgen random mod 2 vmap { 0 : jump KUBE-SEP-FS3FUULGZPVD4VYB, 1 : jump KUBE-SEP-MMFZROQSLQ3DKOQA } } chain KUBE-SVC-MUPXPVK4XAZHSWAR { jump KUBE-SEP-LO6TEVOI6GV524F3 } chain KUBE-SEP-CUAZ6PSSTEDPJ43V { ip saddr 57.112.0.244 meta mark set 0x00004000 dnat to 57.112.0.244:15443 fully-random } chain KUBE-SEP-FS3FUULGZPVD4VYB { ip saddr 57.112.0.247 meta mark set 0x00004000 dnat to 57.112.0.247:8080 fully-random } chain KUBE-SEP-MMFZROQSLQ3DKOQA { ip saddr 57.112.0.248 meta mark set 0x00004000 dnat to 57.112.0.248:8080 fully-random } chain KUBE-SEP-LO6TEVOI6GV524F3 { ip saddr 57.112.0.250 meta mark set 0x00004000 dnat to 57.112.0.250:38989 fully-random } } On 2019-12-04, 12:49 PM, "Serguei Bezverkhi (sbezverk)" <sbezverk@xxxxxxxxx> wrote: Hello @Phil, Just to confirm, If I do, Numgen random mod 3 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3 } Then if 4th endpoint appears I replace the previous rule with: Numgen random mod 4 vmap { 0 : jump endpoint1, 1 : jump endpoint2, 2 : jump endpoint3, 3 : jump endpoint4 } It should do the trick of loadbalancing, right? @Arturo I am no planning to use " dnat numgen randmo { 0-49 : <ip>:<port> }." Each end point will have it is own chain and it will to dnat to ip and specific to endpoint target port. The load balancing will be done in service chain between multiple endpoint chains. See example above. Does it make sense? Thank you Serguei On 2019-12-04, 12:31 PM, "Arturo Borrero Gonzalez" <arturo@xxxxxxxxxxxxx> wrote: On 12/4/19 4:56 PM, Phil Sutter wrote: > OK, static load-balancing between two services - no big deal. :) > > What happens if config changes? I.e., if one of the endpoints goes down > or a third one is added? (That's the thing we're discussing right now, > aren't we?) if the non-anon map for random numgen was allowed, then only elements would need to be adjusted: dnat numgen random mod 100 map { 0-49 : 1.1.1.1, 50-99 : 2.2.2.2 } You could always use mod 100 (or 10000 if you want) and just play with the map probabilities by updating map elements. This is a valid use case I think. The mod number can just be the max number of allowed endpoints per service in kubernetes. @Phil, I'm not sure if the typeof() thingy will work in this case, since the integer length would depend on the mod value used. What about introducing something like an explicit u128 integer datatype. Perhaps it's useful for other use cases too... @Serguei, kubernetes implements a complex chain of mechanisms to deal with traffic. What happens if endpoints for a given svc have different ports? I don't know if that's supported or not, but then this approach wouldn't work either: you can't use dnat numgen randmo { 0-49 : <ip>:<port> }. Also, we have the masquerade/drop thing going on too, which needs to be deal with and that currently is done by yet another chain jump + packet mark. I'm not sure in which state of the development you are, but this is my suggestion: Try to don't over-optimize in the first iteration. Just get a working nft ruleset with the few optimization that make sense and are easy to use (and understand). For iteration #2 we can do better optimizations, including patching missing features we may have in nftables. I really want a ruleset with very little rules, but we are still comparing with the iptables ruleset. I suggest we leave the hard optimization for a later point when we are comparing nft vs nft rulesets.