Re: [PATCH nf-next] netfilter: nf_tables: add ebpf expression

Eyal Birger <eyal.birger@xxxxxxxxx> · Mon, 5 Sep 2022 20:50:20 +0300

On Fri, Sep 2, 2022 at 7:53 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Wed, Aug 31, 2022 at 10:18 PM Eyal Birger <eyal.birger@xxxxxxxxx> wrote:
> >
> > On Thu, Sep 1, 2022 at 1:16 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
> > >
> > > On 8/31/22 7:26 PM, Alexei Starovoitov wrote:
> > > > On Wed, Aug 31, 2022 at 8:53 AM Florian Westphal <fw@xxxxxxxxx> wrote:
> > > >> Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote:
> > > >>>> 1 and 2 have the upside that its easy to handle a 'file not found'
> > > >>>> error.
> > > >>>
> > > >>> I'm strongly against calling into bpf from the inner guts of nft.
> > > >>> Nack to all options discussed in this thread.
> > > >>> None of them make any sense.
> > > >>
> > > >> -v please.  I can just rework userspace to allow going via xt_bpf
> > > >> but its brain damaged.
> > > >
> > > > Right. xt_bpf was a dead end from the start.
> > > > It's time to deprecate it and remove it.
> > > >
> > > >> This helps gradually moving towards move epbf for those that
> > > >> still heavily rely on the classic forwarding path.
> > > >
> > > > No one is using it.
> > > > If it was, we would have seen at least one bug report over
> > > > all these years. We've seen none.
> > > >
> > > > tbh we had a fair share of wrong design decisions that look
> > > > very reasonable early on and turned out to be useless with
> > > > zero users.
> > > > BPF_PROG_TYPE_SCHED_ACT and BPF_PROG_TYPE_LWT*
> > > > are in this category. > All this code does is bit rot.
> > >
> > > +1
> > >
> > > > As a minimum we shouldn't step on the same rakes.
> > > > xt_ebpf would be the same dead code as xt_bpf.
> > >
> > > +1, and on top, the user experience will just be horrible. :(
> > >
> > > >> If you are open to BPF_PROG_TYPE_NETFILTER I can go that route
> > > >> as well, raw bpf program attachment via NF_HOOK and the bpf dispatcher,
> > > >> but it will take significantly longer to get there.
> > > >>
> > > >> It involves reviving
> > > >> https://lore.kernel.org/netfilter-devel/20211014121046.29329-1-fw@xxxxxxxxx/
> > > >
> > > > I missed it earlier. What is the end goal ?
> > > > Optimize nft run-time with on the fly generation of bpf byte code ?
> > >
> > > Or rather to provide a pendant to nft given existence of xt_bpf, and the
> > > latter will be removed at some point? (If so, can't we just deprecate the
> > > old xt_bpf?)
> >
> > FWIW we've been using both lwt bpf and xt_bpf on our production workloads
> > for a few years now.
> >
> > xt_bpf allows us to apply custom sophisticated policy logic at connection
> > establishment - which is not really possible (or efficient) using
> > iptables/nft constructs - without needing to reinvent all the facilities that
> > nf provides like connection tracking, ALGs, and simple filtering.
> >
> > As for lwt bpf, We use it for load balancing towards collect md tunnels.
> > While this can be done at tc egress for unfragmented packets, the lwt out hook -
> > when used in tandem with nf fragment reassembly - provides a hooking point
> > where a bpf program can see reassembled packets and load balance based on
> > their internals.
>
> Sounds very interesting!
> Any open source code to look at ?

For these projects there isn't at this point. But some of the benefit in these
specific hooking points is that our custom logic is very scoped and integrates
well with the "classical" forwarding path.

In netfilter we have an identity based policy engine provisioning sets of
bpf maps. These maps are use used by policy programs invoked by xt_bpf on
connection establishment as part of a larger set of iptables rules.

In LWT this solved us a problem with fragmented traffic, as our load
balancing solution supports - among other things - IPsec stickiness based
on ESP-in-UDP SPI and as such needs to see unfragemented traffic.

Eyal.