Re: [PATCH bpf-next v1 00/10] bpfilter

Yonghong Song <yhs@xxxxxx> · Wed, 9 Jun 2021 17:50:13 -0700

On 6/3/21 3:14 AM, Dmitrii Banshchikov wrote:
The patchset is based on the patches from David S. Miller [1] and
Daniel Borkmann [2].

The main goal of the patchset is to prepare bpfilter for
iptables' configuration blob parsing and code generation.

The patchset introduces data structures and code for matches,
targets, rules and tables.

The current version misses handling of counters. Postpone its
implementation until the code generation phase as it's not clear
yet how to better handle them.

Beside that there is no support of net namespaces at all.

In the next iteration basic code generation shall be introduced.

The rough plan for the code generation.

It seems reasonable to assume that the first rules should cover
most of the packet flow.  This is why they are critical from the
performance point of view.  At the same time number of user
defined rules might be pretty large. Also there is a limit on
size and complexity of a BPF program introduced by the verifier.

There are two approaches how to handle iptables' rules in
generated BPF programs.

The first approach is to generate a BPF program that is an
equivalent to a set of rules on a rule by rule basis. This
approach should give the best performance. The drawback is the
limitation from the verifier on size and complexity of BPF
program.

The second approach is to use an internal representation of rules
stored in a BPF map and use bpf_for_each_map_elem() helper to
iterate over them. In this case the helper's callback is a BPF
function that is able to process any valid rule.

Combination of the two approaches should give most of the
benefits - a heuristic should help to select a small subset of
the rules for code generation on a rule by rule basis. All other
rules are cold and it should be possible to store them in an
internal form in a BPF map. The rules will be handled by
bpf_for_each_map_elem().  This should remove the limit on the
number of supported rules.

Agree. A bpf program inlines some hot rule handling and put
the rest in for_each_map_elem() sounds reasonable to me.

During development it was useful to use statically linked
sanitizers in bpfilter usermode helper. Also it is possible to
use fuzzers but it's not clear if it is worth adding them to the
test infrastructure - because there are no other fuzzers under
tools/testing/selftests currently.

Patch 1 adds definitions of the used types.
Patch 2 adds logging to bpfilter.
Patch 3 adds bpfilter header to tools
Patch 4 adds an associative map.
Patches 5/6/7/8 add code for matches, targets, rules and table.
Patch 9 handles hooked setsockopt(2) calls.
Patch 10 uses prepared code in main().

Here is an example:
% dmesg  | tail -n 2
[   23.636102] bpfilter: Loaded bpfilter_umh pid 181
[   23.658529] bpfilter: started
% /usr/sbin/iptables-legacy -L -n

So this /usr/sbin/iptables-legacy is your iptables variant to
translate iptable command lines to BPFILTER_IPT_SO_*,
right? It could be good to provide a pointer to the source
or binary so people can give a try.

I am not an expert in iptables. Reading codes, I kind of
can grasp the high-level ideas of the patch, but probably
Alexei or Daniel can review some details whether the
design is sufficient to be an iptable replacement.

Chain INPUT (policy ACCEPT)
target     prot opt source               destination

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination

[...]