Re: [PATCH RFC 0/4] net: add bpfilter

Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> · Mon, 19 Feb 2018 16:42:40 -0500

> I see several possible areas of contention:
>
> 1) If you aim for a non-feature-complete support of iptables rules, it
>    will create confusion to the users.

Right, you need full feature parity to be avoid ending up having to
maintain two implementations.

It seems uncontroversial that BPF can be very powerful if run at iptables
hooks. For performance, but also versatility. The android folks are converting
one out-of-tree module to BPF. There is probably a lot more such
business logic out there that is not suitable for inclusion in mainline as
an xt match/target, and that needs more access than xt_bpf can provide.

If a new first-class citizen BPF infra can do this and back the legacy
interface, too, that would save on maintenance. There is a steady
stream of fixes to iptables, e.g., from syzkaller vulnerability reports.
Just keeping the old implementation around as a dead letter is not a
safe deprecation strategy.

To bootstrap bpfilter, in the short term a reasonable set of iptables targets
and matches can perhaps be ported to BPF external functions with some
simple glue code.

> To me, this looks like some kind of legacy backwards compatibility
> mechanism that one would find in proprietary operating systems, but not
> in Linux.  iptables, libiptc etc. are all free software.  The source
> code can be edited, and you could just as well have a new version of
> iptables and/or libiptc which would pass the ruleset in userspace to
> your compiler, which would then insert the resulting eBPF program.
>
> Why add quite comprehensive kerne infrastructure?  What's the motivation
> here?

The ABI deprecation point has been discussed quite a bit. If it is
infeasible to just drop the old interface, then an upcall mechanism
does seem the most practical approach to dynamically generating this
code. FWIW, as BPF is being used in more places, other locations
besides iptables could make use of this.

> Could you please clarify why the 'filter' table INPUT chain was used if
> you're using XDP?  AFAICT they have completely different semantics.
>
> There is a well-conceived and generally understood notion of where
> exactly the filter/INPUT table processing happens.  And that's not as
> early as in the NIC, but it's much later in the processing of the
> packet.
>
> I believe _if_ one wants to use the approach of "hiding" eBPF behind
> iptables, then either
>
> a) the eBPF programs must be executed at the exact same points in the
>    stack as the existing hooks of the built-in chains of the
>    filter/nat/mangle/raw tables, or
>
> b) you must introduce new 'tables', like an 'xdp' table which then has
>    the notion of processing very early in processing, way before the
>    normal filter table INPUT processing happens.

Agreed. One of the larger issues in the conversion of the Android
qtaguid conversion was the state surrounding the skb at the time of
processing. This example primarily depended on having skb->sk set.
Whether that is available at tc depends on early decap and even when
set the sk might prove different from the final one in the socket layer in
edge cases. Just one example how moving the call site can be very
fragile wrt state.

Another issue wrt moving around is availability of external functions
at different layers. XDP has access to far fewer than TC. For iptables,
I would imagine that you either want parity with TC or even a new
independent type. Parity would be useful also to expose some xt_match
functionality at the TC layer that is currently missing there.

> My main points are:
>
> 1) What is the goal of this?

My high bit feedback: for cases like taguid, it is very useful to be able
to execute BPF as drop-in at existing iptables locations, as is having
various match and target functionality available from BPF.

Maintaining the legacy ABI is basically dictated. If this can be achieved
while optimizing the runtime path and reducing maintenance that is
very appealing.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html