Hi, I guess we had a very similar conversation with the sig-network guys. Please see below some comments. On Thu, Nov 28, 2019 at 2:22 AM Serguei Bezverkhi (sbezverk) <sbezverk@xxxxxxxxx> wrote: > > Hello Phil, > > Please see below the list of nftables rules the code generate to mimic only filter chain portion of kube proxy. > > Here is the location of code programming these rules. > https://github.com/sbezverk/nftableslib-samples/blob/master/proxy/mimic-filter/mimic-filter.go > > Most of rules are static, will be programed just once when proxy comes up, with the exception is 2 rules in k8s-filter-services chain. The reference to the list of ports can change. Ideally it would be great to express these two rules with a single rule and a vmap, where the key must be service's ip AND service port, as it is possible to have a single service IP that can be associated with several ports and some of these ports might have an endpoint and some do not. So far I could not figure it out. Appreciate your thought/suggestions/critics. If you could file an issue for anything you feel needs to be discussed, that would be great. > > > sudo nft list table ipv4table > table ip ipv4table { > set svc1-no-endpoints { > type inet_service > elements = { 8989 } > } > > chain filter-input { > type filter hook input priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-output { > type filter hook output priority filter; policy accept; > ct state new jump k8s-filter-services > jump k8s-filter-firewall > } > > chain filter-forward { > type filter hook forward priority filter; policy accept; > jump k8s-filter-forward > ct state new jump k8s-filter-services > } > > chain k8s-filter-ext-services { > } > > chain k8s-filter-firewall { > meta mark 0x00008000 drop > } > > chain k8s-filter-services { > ip daddr 192.168.80.104 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > ip daddr 57.131.151.19 tcp dport @svc1-no-endpoints reject with icmp type host-unreachable > } > Here you're going to have the same problems with iptables, lack of scalability and complexity during rules removal. In nftlb we create maps and with the same rules, you only have to take care of insert and remove elements in them. Some extensive examples here: https://github.com/zevenet/nftlb/tree/master/tests In regards to the ip : port natting, is not possible to use 2 maps cause you need to generate numgen per each one and it will come to different numbers. Cheers. > chain k8s-filter-forward { > ct state invalid drop > meta mark 0x00004000 accept > ip saddr 57.112.0.0/12 ct state established,related accept > ip daddr 57.112.0.0/12 ct state established,related accept > } > } > > Thank you > Serguei > > On 2019-11-27, 12:22 PM, "n0-1@xxxxxxxxxxxxx on behalf of Phil Sutter" <n0-1@xxxxxxxxxxxxx on behalf of phil@xxxxxx> wrote: > > Hi, > > On Wed, Nov 27, 2019 at 04:50:56PM +0000, Serguei Bezverkhi (sbezverk) wrote: > > According to api folks kube-proxy must sustain 5k or about test otherwise it will never see production environment. Implementing of numgen expression is relatively simple, thanks to "nft --debug all" once it's done, a user can use it as easily as with json __ > > > > Regarding concurrent usage, since my primary goal is kube-proxy I do not really care at this moment, as k8s cluster is not an application you co-locate in production with some other applications potentially altering host's tables. I agree firewalld might be interesting and more generic alternative, but seeing how quickly things are done in k8s, maybe it will be done by the end of 21st century __ > > I agree, in dedicated setup there's no need for compromises. I guess if > you manage to reduce ruleset changes to mere set element modifications, > you could outperform iptables in that regard. Run-time performance of > the resulting ruleset will obviously benefit from set/map use as there > are much fewer rules to traverse for each packet. > > > Once I get filter chain portion in the code I will share a link to repo so you could review. > > Thanks! I'm also interested in seeing whether there are any > inconveniences due to nftables limitations. Maybe some problems are > easier solved on kernel-side. > > Cheers, Phil > >