On Wed, Jan 9, 2013 at 4:52 AM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > Hi Willem, > > On Tue, Jan 08, 2013 at 08:58:37PM -0500, Willem de Bruijn wrote: >> On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: >> > Hi Willem, >> > >> > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote: >> >> Support arbitrary linux socket filter (BPF) programs as iptables >> >> match rules. This allows for very expressive filters, and on >> >> platforms with BPF JIT appears competitive with traditional hardcoded >> >> iptables rules. >> >> >> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without >> >> any iptables rules (40 GBps), >> >> >> >> inserting 100x this bpf rule gives 28K >> >> >> >> ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j >> >> >> >> (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') >> >> >> >> inserting 100x this u32 rule gives 21K >> >> >> >> ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP >> >> >> >> The two are logically equivalent, as far as I can tell. Let me know >> >> if my test methodology is flawed in some way. Even in cases where >> >> slower, the filter adds functionality currently lacking in iptables, >> >> such as access to sk_buff fields like rxhash and queue_mapping. >> >> >> >> Signed-off-by: Willem de Bruijn <willemb@xxxxxxxxxx> >> >> --- >> >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> >> net/netfilter/Kconfig | 9 ++++ >> >> net/netfilter/Makefile | 1 + >> >> net/netfilter/x_tables.c | 5 +- >> >> net/netfilter/xt_bpf.c | 86 ++++++++++++++++++++++++++++++++++++++ >> >> 5 files changed, 116 insertions(+), 2 deletions(-) >> >> create mode 100644 include/linux/netfilter/xt_bpf.h >> >> create mode 100644 net/netfilter/xt_bpf.c >> >> >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> >> new file mode 100644 >> >> index 0000000..23502c0 >> >> --- /dev/null >> >> +++ b/include/linux/netfilter/xt_bpf.h >> >> @@ -0,0 +1,17 @@ >> >> +#ifndef _XT_BPF_H >> >> +#define _XT_BPF_H >> >> + >> >> +#include <linux/filter.h> >> >> +#include <linux/types.h> >> >> + >> >> +struct xt_bpf_info { >> >> + __u16 bpf_program_num_elem; >> >> + >> >> + /* only used in kernel */ >> >> + struct sk_filter *filter __attribute__((aligned(8))); >> > >> > I see. You set match->userspacesize to zero in libxt_bpf to skip the >> > comparison of that internal struct sk_filter *filter. >> > >> >> + >> >> + /* variable size, based on program_num_elem */ >> >> + struct sock_filter bpf_program[0]; >> > >> > While testing this I noticed: >> > >> > iptables -I OUTPUT -m bpf --bytecode \ >> > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT >> > >> > Note that this works but it should not. >> > >> > iptables -D OUTPUT -m bpf --bytecode \ >> > '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT >> > ^ >> > Mind that 1, it's a different filter, but it deletes the previous >> > filter without problems here. >> > >> > A quick look at make_delete_mask() in iptables tells me that the >> > changes you made to userspace to allow variable size matches are not >> > enough to generate a sane mask (which is fundamental while looking for >> > a matching rule during the deletion). >> >> Thanks for finding this, Pablo. I completely forgot to check that. >> >> I've never looked at that deletion code before. Will read it and >> hopefully propose a simple fix in a few days. An earlier version of >> the patch used a statically sized struct, by the way, like xt_string >> does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to >> incorporate, we can always revert to that. > > I prefer if this sticks to static size by now. Okay. That is actually a lot simpler. > The problem is that > BPF_MAXINSNS is probably too much to allocate per rule. So you'll have > to limit this to some reasonable amount of lines in the filter. > Please, also check that iptables-save and iptables-restore work fine, > there is also some problem with the existing code. Done. I'll send updated patches right after this. Verified that they work using ## test append # fail: more rules than num_rules ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,6 0 0 0' -j ACCEPT # fail: fewer rules than num_rules ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96' -j ACCEPT # pass: correct ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT ## test delete # fail: differs ./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 1' -j ACCEPT # pass: same ./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT ## test save/restore ./iptables-save > out && cat out && ./iptables-restore < out && echo "OK" I did not retest the datapath for this revision. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html