On Wed, Dec 5, 2012 at 2:48 PM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > Hi Willem, > > On Wed, Dec 05, 2012 at 02:22:19PM -0500, Willem de Bruijn wrote: >> A new match that executes sk_run_filter on every packet. BPF filters >> can access skbuff fields that are out of scope for existing iptables >> rules, allow more expressive logic, and on platforms with JIT support >> can even be faster. >> >> I have a corresponding iptables patch that takes `tcpdump -ddd` >> output, as used in the examples below. The two parts communicate >> using a variable length structure. This is similar to ebt_among, >> but new for iptables. >> >> Verified functionality by inserting an ip source filter on chain >> INPUT and an ip dest filter on chain OUTPUT and noting that ping >> failed while a rule was active: >> >> iptables -v -A INPUT -m bpf --bytecode '4,32 0 0 12,21 0 1 $SADDR,6 0 0 96,6 0 0 0,' -j DROP >> iptables -v -A OUTPUT -m bpf --bytecode '4,32 0 0 16,21 0 1 $DADDR,6 0 0 96,6 0 0 0,' -j DROP > > I like this BPF idea for iptables. > > I made a similar extension time ago, but it was taking a file as > parameter. That file contained in BPF code. I made a simple bison > parser that takes BPF code and put it into the bpf array of > instructions. It would be a bit more intuitive to define a filter and > we can distribute it with iptables. That's cleaner, indeed. I actually like how tcpdump operates as a code generator if you pass -ddd. Unfortunately, it generates code only for link layer types of its supported devices, such as DLT_EN10MB and DLT_LINUX_SLL. The network layer interface of basic iptables (forgetting device dependent mechanisms as used in xt_mac) is DLT_RAW, but that is rarely supported. > Let me check on my internal trees, I can put that user-space code > somewhere in case you're interested. Absolutely. I'll be happy to revise to get it in. I'm also considering sending a patch to tcpdump to make it generate code independent of the installed hardware when specifying -y. >> Evaluated throughput by running netperf TCP_STREAM over loopback on >> x86_64. I expected the BPF filter to outperform hardcoded iptables >> filters when replacing multiple matches with a single bpf match, but >> even a single comparison to u32 appears to do better. Relative to the >> benchmark with no filter applied, rate with 100 BPF filters dropped >> to 81%. With 100 U32 filters it dropped to 55%. The difference sounds >> excessive to me, but was consistent on my hardware. Commands used: >> >> for i in `seq 100`; do iptables -A OUTPUT -m bpf --bytecode '4,48 0 0 9,21 0 1 20,6 0 0 96,6 0 0 0,' -j DROP; done >> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done >> >> iptables -F OUTPUT >> >> for i in `seq 100`; do iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP; done >> for i in `seq 3`; do netperf -t TCP_STREAM -I 99 -H localhost; done >> >> FYI: perf top >> >> [bpf] >> 33.94% [kernel] [k] copy_user_generic_string >> 8.92% [kernel] [k] sk_run_filter >> 7.77% [ip_tables] [k] ipt_do_table >> >> [u32] >> 22.63% [kernel] [k] copy_user_generic_string >> 14.46% [kernel] [k] memcpy >> 9.19% [ip_tables] [k] ipt_do_table >> 8.47% [xt_u32] [k] u32_mt >> 5.32% [kernel] [k] skb_copy_bits >> >> The big difference appears to be in memory copying. I have not >> looked into u32, so cannot explain this right now. More interestingly, >> at higher rate, sk_run_filter appears to use as many cycles as u32_mt >> (both traces have roughly the same number of events). >> >> One caveat: to work independent of device link layer, the filter >> expects DLT_RAW style BPF programs, i.e., those that expect the >> packet to start at the IP layer. >> --- >> include/linux/netfilter/xt_bpf.h | 17 +++++++ >> net/netfilter/Kconfig | 9 ++++ >> net/netfilter/Makefile | 1 + >> net/netfilter/x_tables.c | 5 +- >> net/netfilter/xt_bpf.c | 88 ++++++++++++++++++++++++++++++++++++++ >> 5 files changed, 118 insertions(+), 2 deletions(-) >> create mode 100644 include/linux/netfilter/xt_bpf.h >> create mode 100644 net/netfilter/xt_bpf.c >> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h >> new file mode 100644 >> index 0000000..23502c0 >> --- /dev/null >> +++ b/include/linux/netfilter/xt_bpf.h >> @@ -0,0 +1,17 @@ >> +#ifndef _XT_BPF_H >> +#define _XT_BPF_H >> + >> +#include <linux/filter.h> >> +#include <linux/types.h> >> + >> +struct xt_bpf_info { >> + __u16 bpf_program_num_elem; >> + >> + /* only used in kernel */ >> + struct sk_filter *filter __attribute__((aligned(8))); >> + >> + /* variable size, based on program_num_elem */ >> + struct sock_filter bpf_program[0]; >> +}; >> + >> +#endif /*_XT_BPF_H */ >> diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig >> index c9739c6..c7cc0b8 100644 >> --- a/net/netfilter/Kconfig >> +++ b/net/netfilter/Kconfig >> @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE >> If you want to compile it as a module, say M here and read >> <file:Documentation/kbuild/modules.txt>. If unsure, say `N'. >> >> +config NETFILTER_XT_MATCH_BPF >> + tristate '"bpf" match support' >> + depends on NETFILTER_ADVANCED >> + help >> + BPF matching applies a linux socket filter to each packet and >> + accepts those for which the filter returns non-zero. >> + >> + To compile it as a module, choose M here. If unsure, say N. >> + >> config NETFILTER_XT_MATCH_CLUSTER >> tristate '"cluster" match support' >> depends on NF_CONNTRACK >> diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile >> index 8e5602f..9f12eeb 100644 >> --- a/net/netfilter/Makefile >> +++ b/net/netfilter/Makefile >> @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o >> >> # matches >> obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o >> +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o >> obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o >> diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c >> index 8d987c3..26306be 100644 >> --- a/net/netfilter/x_tables.c >> +++ b/net/netfilter/x_tables.c >> @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, >> if (XT_ALIGN(par->match->matchsize) != size && >> par->match->matchsize != -1) { >> /* >> - * ebt_among is exempt from centralized matchsize checking >> - * because it uses a dynamic-size data set. >> + * matches of variable size length, such as ebt_among, >> + * are exempt from centralized matchsize checking. They >> + * skip the test by setting xt_match.matchsize to -1. >> */ >> pr_err("%s_tables: %s.%u match: invalid size " >> "%u (kernel) != (user) %u\n", >> diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c >> new file mode 100644 >> index 0000000..07077c5 >> --- /dev/null >> +++ b/net/netfilter/xt_bpf.c >> @@ -0,0 +1,88 @@ >> +/* Xtables module to match packets using a BPF filter. >> + * Copyright 2012 Google Inc. >> + * Written by Willem de Bruijn <willemb@xxxxxxxxxx> >> + * >> + * This program is free software; you can redistribute it and/or modify >> + * it under the terms of the GNU General Public License version 2 as >> + * published by the Free Software Foundation. >> + */ >> + >> +#include <linux/module.h> >> +#include <linux/skbuff.h> >> +#include <linux/ipv6.h> >> +#include <linux/filter.h> >> +#include <net/ip.h> >> + >> +#include <linux/netfilter/xt_bpf.h> >> +#include <linux/netfilter/x_tables.h> >> + >> +MODULE_AUTHOR("Willem de Bruijn <willemb@xxxxxxxxxx>"); >> +MODULE_DESCRIPTION("Xtables: BPF filter match"); >> +MODULE_LICENSE("GPL"); >> +MODULE_ALIAS("ipt_bpf"); >> +MODULE_ALIAS("ip6t_bpf"); >> + >> +static int bpf_mt_check(const struct xt_mtchk_param *par) >> +{ >> + struct xt_bpf_info *info = par->matchinfo; >> + const struct xt_entry_match *match; >> + struct sock_fprog program; >> + int expected_len; >> + >> + match = container_of(par->matchinfo, const struct xt_entry_match, data); >> + expected_len = sizeof(struct xt_entry_match) + >> + sizeof(struct xt_bpf_info) + >> + (sizeof(struct sock_filter) * >> + info->bpf_program_num_elem); >> + >> + if (match->u.match_size != expected_len) { >> + pr_info("bpf: check failed: incorrect length\n"); >> + return -EINVAL; >> + } >> + >> + program.len = info->bpf_program_num_elem; >> + program.filter = info->bpf_program; >> + if (sk_unattached_filter_create(&info->filter, &program)) { >> + pr_info("bpf: check failed: parse error\n"); >> + return -EINVAL; >> + } >> + >> + return 0; >> +} >> + >> +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + >> + return SK_RUN_FILTER(info->filter, skb); >> +} >> + >> +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) >> +{ >> + const struct xt_bpf_info *info = par->matchinfo; >> + sk_unattached_filter_destroy(info->filter); >> +} >> + >> +static struct xt_match bpf_mt_reg __read_mostly = { >> + .name = "bpf", >> + .revision = 0, >> + .family = NFPROTO_UNSPEC, >> + .checkentry = bpf_mt_check, >> + .match = bpf_mt, >> + .destroy = bpf_mt_destroy, >> + .matchsize = -1, /* skip xt_check_match because of dynamic len */ >> + .me = THIS_MODULE, >> +}; >> + >> +static int __init bpf_mt_init(void) >> +{ >> + return xt_register_match(&bpf_mt_reg); >> +} >> + >> +static void __exit bpf_mt_exit(void) >> +{ >> + xt_unregister_match(&bpf_mt_reg); >> +} >> + >> +module_init(bpf_mt_init); >> +module_exit(bpf_mt_exit); >> -- >> 1.7.7.3 >> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html