Re: [PATCH next] iptables: add xt_bpf match

Willem de Bruijn <willemb@xxxxxxxxxx> · Wed, 9 Jan 2013 19:08:06 -0500

On Wed, Jan 9, 2013 at 4:52 AM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
> Hi Willem,
>
> On Tue, Jan 08, 2013 at 08:58:37PM -0500, Willem de Bruijn wrote:
>> On Mon, Jan 7, 2013 at 10:21 PM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote:
>> > Hi Willem,
>> >
>> > On Sun, Dec 09, 2012 at 04:52:58PM -0500, Willem de Bruijn wrote:
>> >> Support arbitrary linux socket filter (BPF) programs as iptables
>> >> match rules. This allows for very expressive filters, and on
>> >> platforms with BPF JIT appears competitive with traditional hardcoded
>> >> iptables rules.
>> >>
>> >> At least, on an x86_64 that achieves 40K netperf TCP_STREAM without
>> >> any iptables rules (40 GBps),
>> >>
>> >> inserting 100x this bpf rule gives 28K
>> >>
>> >>     ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j
>> >>
>> >>     (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',')
>> >>
>> >> inserting 100x this u32 rule gives 21K
>> >>
>> >>     ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP
>> >>
>> >> The two are logically equivalent, as far as I can tell. Let me know
>> >> if my test methodology is flawed in some way. Even in cases where
>> >> slower, the filter adds functionality currently lacking in iptables,
>> >> such as access to sk_buff fields like rxhash and queue_mapping.
>> >>
>> >> Signed-off-by: Willem de Bruijn <willemb@xxxxxxxxxx>
>> >> ---
>> >>  include/linux/netfilter/xt_bpf.h |   17 +++++++
>> >>  net/netfilter/Kconfig            |    9 ++++
>> >>  net/netfilter/Makefile           |    1 +
>> >>  net/netfilter/x_tables.c         |    5 +-
>> >>  net/netfilter/xt_bpf.c           |   86 ++++++++++++++++++++++++++++++++++++++
>> >>  5 files changed, 116 insertions(+), 2 deletions(-)
>> >>  create mode 100644 include/linux/netfilter/xt_bpf.h
>> >>  create mode 100644 net/netfilter/xt_bpf.c
>> >>
>> >> diff --git a/include/linux/netfilter/xt_bpf.h b/include/linux/netfilter/xt_bpf.h
>> >> new file mode 100644
>> >> index 0000000..23502c0
>> >> --- /dev/null
>> >> +++ b/include/linux/netfilter/xt_bpf.h
>> >> @@ -0,0 +1,17 @@
>> >> +#ifndef _XT_BPF_H
>> >> +#define _XT_BPF_H
>> >> +
>> >> +#include <linux/filter.h>
>> >> +#include <linux/types.h>
>> >> +
>> >> +struct xt_bpf_info {
>> >> +     __u16 bpf_program_num_elem;
>> >> +
>> >> +     /* only used in kernel */
>> >> +     struct sk_filter *filter __attribute__((aligned(8)));
>> >
>> > I see. You set match->userspacesize to zero in libxt_bpf to skip the
>> > comparison of that internal struct sk_filter *filter.
>> >
>> >> +
>> >> +     /* variable size, based on program_num_elem */
>> >> +     struct sock_filter bpf_program[0];
>> >
>> > While testing this I noticed:
>> >
>> > iptables -I OUTPUT -m bpf --bytecode   \
>> >         '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT
>> >
>> > Note that this works but it should not.
>> >
>> > iptables -D OUTPUT -m bpf --bytecode   \
>> >         '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,1 0 0 0' -j ACCEPT
>> >                                                                ^
>> > Mind that 1, it's a different filter, but it deletes the previous
>> > filter without problems here.
>> >
>> > A quick look at make_delete_mask() in iptables tells me that the
>> > changes you made to userspace to allow variable size matches are not
>> > enough to generate a sane mask (which is fundamental while looking for
>> > a matching rule during the deletion).
>>
>> Thanks for finding this, Pablo. I completely forgot to check that.
>>
>> I've never looked at that deletion code before. Will read it and
>> hopefully propose a simple fix in a few days. An earlier version of
>> the patch used a statically sized struct, by the way, like xt_string
>> does (XT_STRING_MAX_PATTERN_SIZE). If it is easier to
>> incorporate, we can always revert to that.
>
> I prefer if this sticks to static size by now.

Okay. That is actually a lot simpler.

> The problem is that
> BPF_MAXINSNS is probably too much to allocate per rule. So you'll have
> to limit this to some reasonable amount of lines in the filter.
> Please, also check that iptables-save and iptables-restore work fine,
> there is also some problem with the existing code.

Done. I'll send updated patches right after this. Verified that they work using

## test append
# fail: more rules than num_rules
./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0
0 25,21 0 1 20,6 0 0 96,6 0 0 0,6 0 0 0' -j ACCEPT
# fail: fewer rules than num_rules
./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0
0 25,21 0 1 20,6 0 0 96' -j ACCEPT
# pass: correct
./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0
0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT

## test delete
# fail: differs
./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0
0 25,21 0 1 20,6 0 0 96,6 0 0 1' -j ACCEPT
# pass: same
./iptables -D OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0
0 25,21 0 1 20,6 0 0 96,6 0 0 0' -j ACCEPT

## test save/restore
./iptables-save > out && cat out && ./iptables-restore < out && echo "OK"

I did not retest the datapath for this revision.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html