Re: [PATCH bpf-next 07/10] bpf: Add helpers to query conntrack info

Florian Westphal <fw@xxxxxxxxx> · Thu, 21 Oct 2021 09:36:25 +0200

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> Florian Westphal <fw@xxxxxxxxx> writes:
> 
> > Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> >> > Lookups should be fine.  Insertions are the problem.
> >> >
> >> > NAT hooks are expected to execute before the insertion into the
> >> > conntrack table.
> >> >
> >> > If you insert before, NAT hooks won't execute, i.e.
> >> > rules that use dnat/redirect/masquerade have no effect.
> >> 
> >> Well yes, if you insert the wrong state into the conntrack table, you're
> >> going to get wrong behaviour. That's sorta expected, there are lots of
> >> things XDP can do to disrupt the packet flow (like just dropping the
> >> packets :)).
> >
> > Sure, but I'm not sure I understand the use case.
> >
> > Insertion at XDP layer turns off netfilters NAT capability, so its
> > incompatible with the classic forwarding path.
> >
> > If thats fine, why do you need to insert into the conntrack table to
> > begin with?  The entire infrastructure its designed for is disabled...
> 
> One of the major selling points of XDP is that you can reuse the
> existing kernel infrastructure instead of having to roll your own. So
> sure, one could implement their own conntrack using BPF maps (as indeed,
> e.g., Cilium has done), but why do that when you can take advantage of
> the existing one in the kernel? Same reason we have the bpf_fib_lookup()
> helper...

Insertion to conntrack via ebpf seems to be bad to me precisely because it
bypasses the existing infra.

In the bypass scenario you're envisioning, who is responsible for
fastpath-or-not decision?

> > In the HW offload case, conntrack is bypassed completely. There is an
> > IPS_(HW)_OFFLOAD_BIT that prevents the flow from expiring.
> 
> That's comparable in execution semantics (stack is bypassed entirely),
> but not in control plane semantics (we lookup from XDP instead of
> pushing flows down to an offload).

I'm not following.  As soon as you do insertion via XDP existing
control plane (*tables ruleset, xfrm and so on) becomes irrelevant.

Say e.g. user has a iptables ruleset that disables conntrack for udp dport
53 to avoid conntrack overhead for local resolver cache.

No longer relevant, ebpf overrides or whatever generates the epbf prog
needs to emulate existing config.

> > I suspect we'd want a way to notify/call an ebpf program instead so we
> > can avoid the ctnetlink -> userspace -> update dance and do the XDP
> > 'flow bypass information update' from inside the kernel and ebpf/XDP
> > reimplementation of the nf flow table (it uses the netfilter ingress
> > hook on the configured devices; everyhing it does should be doable
> > from XDP).
> 
> But the point is exactly that we don't have to duplicate the state into
> BPF, we can make XDP look it up directly.

Normally for fast bypass I'd expect that the bypass infra would want to
access all info in one lookup, but conntrack only gives you the NAT
transformation, so you'll also need a sk lookup and possibly a FIB
lookup later to get the route.
Also maybe an xfrm lookup as well if your bypass infra needs to support
ipsec.

So I neither understand the need for conntrack lookup (*for fast bypass use
case*) nor the need for insert IFF the control plane we have is to be
respected.