Re: [PATCH bpf-next v2 1/7] bpf: Add generic attach/detach/query API for multi-progs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Fri, 9 Jun 2023 20:37:35 -0700

On Fri, Jun 9, 2023 at 8:03 PM Prankur gupta <prankgup@xxxxxx> wrote:
>
> >>
> >> Me, Daniel, Timo are arguing that there are real situations where you
> >> have to be first or need to die.
> >
> > afaik out of all xdp and tc progs there is not a single prog in the fb fleet
> > that has to be first.
> > fb's ddos and firewall don't have to be first.
> > cilium and datadog progs don't have to be first either.
> > The race between cilium and datadog was not the race to the first position,
> > but the conflict due to the same prio.
> > In all cases, I'm aware, prog owners care a lot about ordering,
> > but never about strict first.
>
> One usecase which we actively use in Meta(fb) fleet is avoiding double writer for
> cgroup/sockops bpf programs. For ex: we can have multiple BPF programs setting
> skops->reply field resulting in stepping on each other for ex: for ECN callback
> one program can set it 1 and other can set it to 0.
> We do that by creating a pre-func and post-func before
> executing sockops BPF program in our custom built chainer.
>
> We want these functions to be executed first and last respectively which actually
> makes the above functionality useful for us.
>
> Hypothetical usecase for cgroup/sockops - Middle BPF programs will not set skops->reply
> and the final BPF program based on results from each of the middle
> BPF program can set the appropriate reply to skops->reply, thus making sure all the middle
> programs executed and the final reply is correct.

cgroup progs are more complicated than a simple list of progs in tc/xdp.
It is not really possible for the kernel to guarantee absolute last and first
in a hierarchy of cgroups. In theory that's possible within a cgroup,
but not when children and parents are involved and progs can be
attached anywhere in the hierarchy and we need to keep
uapi of BPF_F_ALLOW_OVERRIDE, BPF_F_ALLOW_MULTI intact.
The absolute first/last is not the answer for this skops issue.
A different solution is necessary.