Re: [PATCH bpf-next v2 0/7] Add bpf_link based TC-BPF API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 10, 2022 at 5:58 AM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:
>
> On Fri, Jun 10, 2022 at 05:54:27AM IST, Joanne Koong wrote:
> > On Thu, Jun 3, 2021 at 11:31 PM Kumar Kartikeya Dwivedi
> > <memxor@xxxxxxxxx> wrote:
> > >
> > > This is the second (non-RFC) version.
> > >
> > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > > replace such filters, but the bpf_link is severed on indirect destruction of the
> > > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > > pinning approach can be used.
> > >
> > > The individual patches contain more details and comments, but the overall kernel
> > > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > > merged recently. This means that we start by always setting direct action mode,
> > > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > > options in the future, they can be easily exposed through the bpf_link API in
> > > the future.
> > >
> > > Patch 1 refactors cls_bpf change function to extract two helpers that will be
> > > reused in bpf_link creation.
> > >
> > > Patch 2 exports some bpf_link management functions to modules. This is needed
> > > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to
> > > tcf_proto would be weird, because the update path has to replace offloaded bpf
> > > prog, which happens using internal cls_bpf helpers, and would in general be more
> > > code to abstract over an operation that is unlikely to be implemented for other
> > > filter types.
> > >
> > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of
> > > obtaining block reference, creating the filter object, and then calls the
> > > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after
> > > setting up the internal structures. An optimization is made to not keep around
> > > resources for extended actions, which is explained in a code comment as it wasn't
> > > immediately obvious.
> > >
> > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports
> > > replacing the bpf_prog, we can skip tc filter's change path by reusing the
> > > filter object but swapping its bpf_prog. This takes care of replacing the
> > > offloaded prog as well (if that fails, update is aborted). So far however,
> > > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter
> > > would never be modified concurrently. This is no longer true, and to not
> > > penalize the classify hot path, we also cannot impose serialization around
> > > its load. Hence the load is changed to READ_ONCE, so that the pointer value is
> > > always consistent. Due to invocation in a RCU critical section, the lifetime of
> > > the prog is guaranteed for the duration of the call.
> > >
> > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning
> > > function to libbpf.
> > >
> > > Patch 7 adds a selftest that exercises all possible problematic interactions
> > > that I could think of.
> > >
> > > Design:
> > >
> > > This is where in the object hierarchy our bpf_link object is attached.
> > >
> > >                                                                             ┌─────┐
> > >                                                                             │     │
> > >                                                                             │ BPF │
> > >                                                                             program
> > >                                                                             │     │
> > >                                                                             └──▲──┘
> > >                                                       ┌───────┐                │
> > >                                                       │       │         ┌──────┴───────┐
> > >                                                       │  mod  ├─────────► cls_bpf_prog │
> > > ┌────────────────┐                                    │cls_bpf│         └────┬───▲─────┘
> > > │    tcf_block   │                                    │       │              │   │
> > > └────────┬───────┘                                    └───▲───┘              │   │
> > >          │          ┌─────────────┐                       │                ┌─▼───┴──┐
> > >          └──────────►  tcf_chain  │                       │                │bpf_link│
> > >                     └───────┬─────┘                       │                └────────┘
> > >                             │          ┌─────────────┐    │
> > >                             └──────────►  tcf_proto  ├────┘
> > >                                        └─────────────┘
> > >
> > > The bpf_link is detached on destruction of the cls_bpf_prog.  Doing it this way
> > > allows us to implement update in a lightweight manner without having to recreate
> > > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog.
> > >
> > > The other way to do it would be to link the bpf_link to tcf_proto, there are
> > > numerous downsides to this:
> > >
> > > 1. All filters have to embed the pointer even though they won't be using it when
> > > cls_bpf is compiled in.
> > > 2. This probably won't make sense to be extended to other filter types anyway.
> > > 3. We aren't able to optimize the update case without adding another bpf_link
> > > specific update operation to tcf_proto ops.
> > >
> > > The downside with tying this to the module is having to export bpf_link
> > > management functions and introducing a tcf_proto op. Hopefully the cost of
> > > another operation func pointer is not big enough (as there is only one ops
> > > struct per module).
> > >
> > Hi Kumar,
> >
> > Do you have any plans / bandwidth to land this feature upstream? If
> > so, do you have a tentative estimation for when you'll be able to work
> > on this? And if not, are you okay with someone else working on this to
> > get it merged in?
> >
>
> I can have a look at resurrecting it later this month, if you're ok with waiting
> until then, otherwise if someone else wants to pick this up before that it's
> fine by me, just let me know so we avoid duplicated effort. Note that the
> approach in v2 is dead/unlikely to get accepted by the TC maintainers, so we'd
> have to implement the way Daniel mentioned in [0].

Sounds great! We'll wait and check back in with you later this month.

>
>   [0]: https://lore.kernel.org/bpf/15cd0a9c-95a1-9766-fca1-4bf9d09e4100@xxxxxxxxxxxxx
>
> > The reason I'm asking is because there are a few networking teams
> > within Meta that have been requesting this feature :)
> >
> > Thanks,
> > Joanne
> >
> > > Changelog:
> > > ----------
> > > v1 (RFC) -> v2
> > > v1: https://lore.kernel.org/bpf/20210528195946.2375109-1-memxor@xxxxxxxxx
> > >
> > >  * Avoid overwriting other members of union in bpf_attr (Andrii)
> > >  * Set link to NULL after bpf_link_cleanup to avoid double free (Andrii)
> > >  * Use __be16 to store the result of htons (Kernel Test Robot)
> > >  * Make assignment of tcf_exts::net conditional on CONFIG_NET_CLS_ACT
> > >    (Kernel Test Robot)
> > >
> > > Kumar Kartikeya Dwivedi (7):
> > >   net: sched: refactor cls_bpf creation code
> > >   bpf: export bpf_link functions for modules
> > >   net: sched: add bpf_link API for bpf classifier
> > >   net: sched: add lightweight update path for cls_bpf
> > >   tools: bpf.h: sync with kernel sources
> > >   libbpf: add bpf_link based TC-BPF management API
> > >   libbpf: add selftest for bpf_link based TC-BPF management API
> > >
> > >  include/linux/bpf_types.h                     |   3 +
> > >  include/net/pkt_cls.h                         |  13 +
> > >  include/net/sch_generic.h                     |   6 +-
> > >  include/uapi/linux/bpf.h                      |  15 +
> > >  kernel/bpf/syscall.c                          |  14 +-
> > >  net/sched/cls_api.c                           | 139 ++++++-
> > >  net/sched/cls_bpf.c                           | 389 ++++++++++++++++--
> > >  tools/include/uapi/linux/bpf.h                |  15 +
> > >  tools/lib/bpf/bpf.c                           |   8 +-
> > >  tools/lib/bpf/bpf.h                           |   8 +-
> > >  tools/lib/bpf/libbpf.c                        |  59 ++-
> > >  tools/lib/bpf/libbpf.h                        |  17 +
> > >  tools/lib/bpf/libbpf.map                      |   1 +
> > >  tools/lib/bpf/netlink.c                       |   5 +-
> > >  tools/lib/bpf/netlink.h                       |   8 +
> > >  .../selftests/bpf/prog_tests/tc_bpf_link.c    | 285 +++++++++++++
> > >  16 files changed, 940 insertions(+), 45 deletions(-)
> > >  create mode 100644 tools/lib/bpf/netlink.h
> > >  create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c
> > >
> > > --
> > > 2.31.1
> > >
>
> --
> Kartikeya




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux