Re: [PATCH bpf-next v2 1/2] bpf: add bpf_ct_lookup_{tcp,udp}() helpers

Joe Stringer <joe@xxxxxxxxxxx> · Fri, 24 Jan 2020 11:11:27 -0800

On Tue, Jan 21, 2020 at 12:36 PM Matt Cover <werekraken@xxxxxxxxx> wrote:
>
> On Tue, Jan 21, 2020 at 1:20 PM Matthew Cover <werekraken@xxxxxxxxx> wrote:
> >
> > Allow looking up an nf_conn. This allows eBPF programs to leverage
> > nf_conntrack state for similar purposes to socket state use cases,
> > as provided by the socket lookup helpers. This is particularly
> > useful when nf_conntrack state is locally available, but socket
> > state is not.

I think there's an important distinction between accessing sockets and
accessing the connection tracker: Sockets are inherently tied to local
processes. They consume resources regardless of what kind of fancy
networking behaviour you desire out of the stack. Connection-tracking
on the other hand only consumes resources if you enable features that
explicitly require that functionality. This raises some interesting
questions.

The kernel disables nf_conntrack by default to alleviate the costs
associated with it[0]. In the case of this proposal, the BPF program
itself is trying to use nf_conntrack, so does that mean that the
kernel should auto-enable nf_conntrack hooks for the current namespace
(or all namespaces, given that the helper provides access into other
namespaces as well) whenever a BPF program is loaded that uses this
helper?

Related side note: What if you wanted to migitate the performance
penalty of turning on nf_conntrack by programmatically choosing
whether to populate the ct table? Do we then need to define an
interface that allows a BPF program to tell nf_conntrack whether or
not to track a given connection?

More importantly, nf_conntrack has a particular view in mind of what a
connection is and the metadata that can be associated with a
connection. On the other hand, one of the big pulls for building
networking functionality in BPF is to allow flexibility. Over time,
more complex use cases will arise that demand additional metadata to
be stored with their connections. Cilium's connection tracking entries
provides a glimpse of this[1]. I'm sure that the OVS-BPF project would
have similar demands. Longer term, do we encourage such projects to
migrate to this implementation, proposing metadata extensions that are
programmable from BPF?

Taking the metadata question further, there is not only the metadata
that arbitrary BPF programs wish to associate with nf_conntrack. There
is also the various extensions that nf_conntrack itself has which
could be interesting for users that depend on that state. Would we
draw a line before providing access into those aspects of nf_conntrack
from BPF?

Beyond metadata, there is the question of write access to
nf_conntrack. Presumably if a read helper like this is added to the
BPF API, it is only balanced to also add create, update and delete
operations? No doubt if someone wants to build NAT or firewall
functionality in BPF using nf_conntrack, they will want this. Does
this take us on the track of eventually exporting the entire
nf_conntrack module (or even nf_nat) internal kernel APIs as external
BPF API?

If the BPF API is going to provide a connection tracker, I feel that
it should aim to solve connection tracking for various potential
users. This takes us from not just what this patch does, but to the
full vision of where this API goes with a connection tracker
implementation that could be reused by e.g. OVS-BPF or Cilium. At this
point, I'm not convinced why such an implementation should exist in
the BPF API rather than as a common library that can be forked and
tweaked for anyone's uses.

What do you see as the split of responsibility between BPF and other
subsystems long-term for your use case that motivates relying upon
nf_conntrack always running?

[0] https://github.com/torvalds/linux/commit/4d3a57f23dec59f0a2362e63540b2d01b37afe0a
[1] https://github.com/cilium/cilium/blob/v1.6.5/bpf/lib/common.h#L510