Hi Pablo, On Wed, Oct 21, 2020 at 12:49:52PM +0200, Pablo Neira Ayuso wrote: > On Wed, Oct 21, 2020 at 12:43:21PM +0200, Pablo Neira Ayuso wrote: > > Hi Phil, > > > > On Fri, Oct 02, 2020 at 11:00:33AM +0200, Phil Sutter wrote: > > > Hi Florian, > > > > > > On Fri, Oct 02, 2020 at 12:25:36AM +0200, Florian Westphal wrote: > > > > Phil Sutter <phil@xxxxxx> wrote: > > > > > The following two patches improve packet throughput in a test setup > > > > > sending UDP packets (using iperf3) between two netns. The ruleset used > > > > > on receiver side is like this: > > > > > > > > > > | *filter > > > > > | :test - [0:0] > > > > > | -A INPUT -j test > > > > > | -A INPUT -j ACCEPT > > > > > | -A test ! -s 10.0.0.0/10 -j DROP # this line repeats 10000 times > > > > > | COMMIT > > > > > > > > > > These are the generated VM instructions for each rule: > > > > > > > > > > | [ payload load 4b @ network header + 12 => reg 1 ] > > > > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ] > > > > > > > > Not related to this patch, but we should avoid the bitop if the > > > > netmask is divisble by 8 (can adjust the cmp -- adjusting the > > > > payload expr is probably not worth it). > > > > > > See the patch I just sent to this list. I adjusted both - it simply > > > didn't appear to me that I could get by with reducing the cmp expression > > > size only. The upside though is that detecting the prefix match based on > > > payload expression length is quick and easy. > > > > > > Someone will have to adjust nft tool, though. ;) > > > > > > > > | [ cmp eq reg 1 0x0000000a ] > > > > > | [ counter pkts 0 bytes 0 ] > > > > > > > > Out of curiosity, does omitting 'counter' help? > > > > > > > > nft counter is rather expensive due to bh disable, > > > > iptables does it once at the evaluation loop only. > > > > > > I changed the test to create the base ruleset using iptables-nft-restore > > > just as before, but create the rules in 'test' chain like so: > > > > > > | nft add rule filter test ip saddr != 10.0.0.0/10 drop > > > > > > The VM code is as expected: > > > > > > | [ payload load 4b @ network header + 12 => reg 1 ] > > > | [ bitwise reg 1 = (reg=1 & 0x0000c0ff ) ^ 0x00000000 ] > > > | [ cmp eq reg 1 0x0000000a ] > > > | [ immediate reg 0 drop ] > > > > > > Performance is ~7000pkt/s. So while it's faster than iptables-nft, it's > > > still quite a bit slower than legacy iptables despite the skipped > > > counters. > > > > iptables is optimized for matching on input/output device name and > > IPv4 address + mask (see ip_packet_match()) for historical reasons, > > iptables does not use a match for this since the beginning. Ah, thanks for the pointer. That function (and the code therein) pretty clearly shows why rule-shredding is so much slower in iptables-nft than legacy despite the attempts at improving it. > For clarity here, I mean: iptables does not use the generic match > infrastructure for matching on these fields, instead it is using > ip_packet_match() which is called from ipt_do_table() which is the > core function that evaluates the packet. > > > One possibility (in the short-term) is to add an internal kernel > > expression to achieve the same behaviour. The kernel needs to detects > > for: > > > > payload (nh, offset to ip saddr or ip daddr or ip protocol) + cmp > > payload (nh, offset to ip saddr or ip daddr) + bitwise + cmp > > meta (iifname or oifname) + bitwise + cmp > > meta (iifname or oifname) + cmp > > > > at the very beginning of the rule. > > > > and squash these expressions into the "built-in" iptables match > > expression which emulates ip_packet_match(). > > > > Not nice, but if microbenchmarks using thousand of rules really matter > > (this is worst case O(n) linear list evaluation...) then it might make > > sense to explore this. I appreciate the effort to identify a solution which "just works", though am not sure if we really should implement such hacks (yet). That said, the "fast" expressions strictly speaking are hacks as well ... Cheers, Phil