Re: [PATCH bpf-next v4 1/8] bpf: Add generic attach/detach/query API for multi-progs

Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> · Tue, 11 Jul 2023 11:51:44 -0700

On Mon, Jul 10, 2023 at 5:23 PM Alexei Starovoitov
<alexei.starovoitov@xxxxxxxxx> wrote:
>
> On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote:
> > + *
> > + *   struct bpf_mprog_entry *entry, *peer;
> > + *   int ret;
> > + *
> > + *   // bpf_mprog user-side lock
> > + *   // fetch active @entry from attach location
> > + *   [...]
> > + *   ret = bpf_mprog_attach(entry, [...]);
> > + *   if (ret >= 0) {
> > + *       peer = bpf_mprog_peer(entry);
> > + *       if (bpf_mprog_swap_entries(ret))
> > + *           // swap @entry to @peer at attach location
> > + *       bpf_mprog_commit(entry);
> > + *       ret = 0;
> > + *   } else {
> > + *       // error path, bail out, propagate @ret
> > + *   }
> > + *   // bpf_mprog user-side unlock
> > + *
> > + *  Detach case:
> > + *
> > + *   struct bpf_mprog_entry *entry, *peer;
> > + *   bool release;
> > + *   int ret;
> > + *
> > + *   // bpf_mprog user-side lock
> > + *   // fetch active @entry from attach location
> > + *   [...]
> > + *   ret = bpf_mprog_detach(entry, [...]);
> > + *   if (ret >= 0) {
> > + *       release = ret == BPF_MPROG_FREE;
> > + *       peer = release ? NULL : bpf_mprog_peer(entry);
> > + *       if (bpf_mprog_swap_entries(ret))
> > + *           // swap @entry to @peer at attach location
> > + *       bpf_mprog_commit(entry);
> > + *       if (release)
> > + *           // free bpf_mprog_bundle
> > + *       ret = 0;
> > + *   } else {
> > + *       // error path, bail out, propagate @ret
> > + *   }
> > + *   // bpf_mprog user-side unlock
>
> Thanks for the doc. It helped a lot.
> And when it's contained like this it's easier to discuss api.
> It seems bpf_mprog_swap_entries() is trying to abstract the error code
> away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release()
> captures it with extra miniq_active twist, which I don't understand yet.
> bpf_mprog_peer() is also leaking a bit of implementation detail.
> Can we abstract it further, like:
>
> ret = bpf_mprog_detach(entry, [...], &new_entry);
> if (ret >= 0) {
>    if (entry != new_entry)
>      // swap @entry to @new_entry at attach location
>    bpf_mprog_commit(entry);
>    if (!new_entry)
>      // free bpf_mprog_bundle
> }
> and make bpf_mprog_peer internal to mprog. It will also allow removing
> BPF_MPROG_FREE vs SWAP distinction. peer is hidden.
>    if (entry != new_entry)
>       // update
> also will be easier to read inside tcx code without looking into mprog details.

I'm actually thinking if it's possible to simplify it even further.
For example, do we even need a separate bpf_mprog_{attach,detach} and
bpf_mprog_commit()? So far it seems like bpf_mprog_commit() is
inevitable in case of success of attach/detach, so we might as well
just do it as the last step of attach/detach operation.

The only problem seems to be due to bpf_mprog interface doing this
optimization of replacing stuff in place, if possible, and allowing
the caller to not do the swap. How important is it to avoid that swap
of a bpf_mprog_fp (pointer)? Seems pretty cheap (and relatively rare
operation), so I wouldn't bother optimizing this.

So how about we just say that there is always a swap. Internally in
bpf_mprog_bundle current entry is determined based on revision&1. We
can have bpf_mprog_cur_entry() to return a proper pointer after
commit. Or bpf_mprog_attach() can return proper new entry as output
parameter, whichever is preferable.

As for BPF_MPROG_FREE. That seems like an unnecessary complication as
well. Caller can just check bpf_mprog_total() quickly, and if it
dropped to zero assume FREE. Unless there is something more subtle
there?

With the above, the interface will be much simpler, IMO. You just do
bpf_mprog_attach/detach, and then swap pointer to new bpf_mprog_entry.
Then you can check bpf_mprog_total() for zero, and clean up further,
if necessary.

We assume the caller has a proper locking, so all the above should be non-racy.

BTW, combining commit with attach allows us to avoid that relatively
big bpf_mprog_cp array on the stack as well, because we will be able
to update bundle->cp_items in-place.

The only (I believe :) ) big assumption I'm making in all of the above
is that commit is inevitable and we won't have a situation where we
start attach, update fp/cpp, and then decide to abort instead of going
for commit. Is this possible? Can we avoid it by careful checks
upfront and doing attach as last step that cannot be undone?

P.S. I guess one bit that I might have simplified is that
synchronize_rcu() + bpf_prog_put(), but I'm not sure exactly why we
put prog after sync_rcu. But if it's really necessary (and I assume it
is) and is a blocker for the proposal above, then maybe the interface
should delegate that to the caller (i.e., optionally return replaced
prog pointer from attach/detach) or use call_rcu() with callback?