Re: [PATCH bpf-next 02/10] bpf: Implement BPF link handling for tc BPF programs

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Thu, 6 Oct 2022 22:54:05 +0200

On 10/6/22 5:19 AM, Andrii Nakryiko wrote:
On Tue, Oct 4, 2022 at 4:12 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:

This work adds BPF links for tc. As a recap, a BPF link represents the attachment
of a BPF program to a BPF hook point. The BPF link holds a single reference to
keep BPF program alive. Moreover, hook points do not reference a BPF link, only
the application's fd or pinning does. A BPF link holds meta-data specific to
attachment and implements operations for link creation, (atomic) BPF program
update, detachment and introspection.

The motivation for BPF links for tc BPF programs is multi-fold, for example:

- "It's especially important for applications that are deployed fleet-wide
    and that don't "control" hosts they are deployed to. If such application
    crashes and no one notices and does anything about that, BPF program will
    keep running draining resources or even just, say, dropping packets. We
    at FB had outages due to such permanent BPF attachment semantics. With
    fd-based BPF link we are getting a framework, which allows safe, auto-
    detachable behavior by default, unless application explicitly opts in by
    pinning the BPF link." [0]

-  From Cilium-side the tc BPF programs we attach to host-facing veth devices
    and phys devices build the core datapath for Kubernetes Pods, and they
    implement forwarding, load-balancing, policy, EDT-management, etc, within
    BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently
    experienced hard-to-debug issues in a user's staging environment where
    another Kubernetes application using tc BPF attached to the same prio/handle
    of cls_bpf, wiping all Cilium-based BPF programs from underneath it. The
    goal is to establish a clear/safe ownership model via links which cannot
    accidentally be overridden. [1]

BPF links for tc can co-exist with non-link attachments, and the semantics are
in line also with XDP links: BPF links cannot replace other BPF links, BPF
links cannot replace non-BPF links, non-BPF links cannot replace BPF links and
lastly only non-BPF links can replace non-BPF links. In case of Cilium, this
would solve mentioned issue of safe ownership model as 3rd party applications
would not be able to accidentally wipe Cilium programs, even if they are not
BPF link aware.

Earlier attempts [2] have tried to integrate BPF links into core tc machinery
to solve cls_bpf, which has been intrusive to the generic tc kernel API with
extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could
be wiped from the qdisc also. Locking a tc BPF program in place this way, is
getting into layering hacks given the two object models are vastly different.
We chose to implement a prerequisite of the fd-based tc BPF attach API, so
that the BPF link implementation fits in naturally similar to other link types
which are fd-based and without the need for changing core tc internal APIs.

BPF programs for tc can then be successively migrated from cls_bpf to the new
tc BPF link without needing to change the program's source code, just the BPF
loader mechanics for attaching.

   [0] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@xxxxxxxxxxxxxx/
   [1] https://lpc.events/event/16/contributions/1353/
   [2] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@xxxxxxxxx/

Co-developed-by: Nikolay Aleksandrov <razor@xxxxxxxxxxxxx>
Signed-off-by: Nikolay Aleksandrov <razor@xxxxxxxxxxxxx>
Signed-off-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
---

have you considered supporting BPF cookie from the outset? It should
be trivial if you remove union from bpf_prog_array_item. If not, then
we should reject LINK_CREATE if bpf_cookie is non-zero.

Haven't considered it yet at this point, but we can add this in subsequent step,
agree, thus we should reject for now upon create.

  include/linux/bpf.h            |   5 +-
  include/net/xtc.h              |  14 ++++
  include/uapi/linux/bpf.h       |   5 ++
  kernel/bpf/net.c               | 116 ++++++++++++++++++++++++++++++---
  kernel/bpf/syscall.c           |   3 +
  tools/include/uapi/linux/bpf.h |   5 ++
  6 files changed, 139 insertions(+), 9 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 71e5f43db378..226a74f65704 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1473,7 +1473,10 @@ struct bpf_prog_array_item {
         union {
                 struct bpf_cgroup_storage *cgroup_storage[MAX_BPF_CGROUP_STORAGE_TYPE];
                 u64 bpf_cookie;
-               u32 bpf_priority;
+               struct {
+                       u32 bpf_priority;
+                       u32 bpf_id;

this is link_id, is that right? should we name it as such?

Ack, will rename, thanks also for all your other suggestions inthe various patches,
all make sense to me & will address them!

+               };
         };
  };


[...]