On Mon, Jul 10, 2023 at 5:23 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Mon, Jul 10, 2023 at 10:12:11PM +0200, Daniel Borkmann wrote: > > + * > > + * struct bpf_mprog_entry *entry, *peer; > > + * int ret; > > + * > > + * // bpf_mprog user-side lock > > + * // fetch active @entry from attach location > > + * [...] > > + * ret = bpf_mprog_attach(entry, [...]); > > + * if (ret >= 0) { > > + * peer = bpf_mprog_peer(entry); > > + * if (bpf_mprog_swap_entries(ret)) > > + * // swap @entry to @peer at attach location > > + * bpf_mprog_commit(entry); > > + * ret = 0; > > + * } else { > > + * // error path, bail out, propagate @ret > > + * } > > + * // bpf_mprog user-side unlock > > + * > > + * Detach case: > > + * > > + * struct bpf_mprog_entry *entry, *peer; > > + * bool release; > > + * int ret; > > + * > > + * // bpf_mprog user-side lock > > + * // fetch active @entry from attach location > > + * [...] > > + * ret = bpf_mprog_detach(entry, [...]); > > + * if (ret >= 0) { > > + * release = ret == BPF_MPROG_FREE; > > + * peer = release ? NULL : bpf_mprog_peer(entry); > > + * if (bpf_mprog_swap_entries(ret)) > > + * // swap @entry to @peer at attach location > > + * bpf_mprog_commit(entry); > > + * if (release) > > + * // free bpf_mprog_bundle > > + * ret = 0; > > + * } else { > > + * // error path, bail out, propagate @ret > > + * } > > + * // bpf_mprog user-side unlock > > Thanks for the doc. It helped a lot. > And when it's contained like this it's easier to discuss api. > It seems bpf_mprog_swap_entries() is trying to abstract the error code > away, but BPF_MPROG_FREE leaks out and tcx_entry_needs_release() > captures it with extra miniq_active twist, which I don't understand yet. > bpf_mprog_peer() is also leaking a bit of implementation detail. > Can we abstract it further, like: > > ret = bpf_mprog_detach(entry, [...], &new_entry); > if (ret >= 0) { > if (entry != new_entry) > // swap @entry to @new_entry at attach location > bpf_mprog_commit(entry); > if (!new_entry) > // free bpf_mprog_bundle > } > and make bpf_mprog_peer internal to mprog. It will also allow removing > BPF_MPROG_FREE vs SWAP distinction. peer is hidden. > if (entry != new_entry) > // update > also will be easier to read inside tcx code without looking into mprog details. I'm actually thinking if it's possible to simplify it even further. For example, do we even need a separate bpf_mprog_{attach,detach} and bpf_mprog_commit()? So far it seems like bpf_mprog_commit() is inevitable in case of success of attach/detach, so we might as well just do it as the last step of attach/detach operation. The only problem seems to be due to bpf_mprog interface doing this optimization of replacing stuff in place, if possible, and allowing the caller to not do the swap. How important is it to avoid that swap of a bpf_mprog_fp (pointer)? Seems pretty cheap (and relatively rare operation), so I wouldn't bother optimizing this. So how about we just say that there is always a swap. Internally in bpf_mprog_bundle current entry is determined based on revision&1. We can have bpf_mprog_cur_entry() to return a proper pointer after commit. Or bpf_mprog_attach() can return proper new entry as output parameter, whichever is preferable. As for BPF_MPROG_FREE. That seems like an unnecessary complication as well. Caller can just check bpf_mprog_total() quickly, and if it dropped to zero assume FREE. Unless there is something more subtle there? With the above, the interface will be much simpler, IMO. You just do bpf_mprog_attach/detach, and then swap pointer to new bpf_mprog_entry. Then you can check bpf_mprog_total() for zero, and clean up further, if necessary. We assume the caller has a proper locking, so all the above should be non-racy. BTW, combining commit with attach allows us to avoid that relatively big bpf_mprog_cp array on the stack as well, because we will be able to update bundle->cp_items in-place. The only (I believe :) ) big assumption I'm making in all of the above is that commit is inevitable and we won't have a situation where we start attach, update fp/cpp, and then decide to abort instead of going for commit. Is this possible? Can we avoid it by careful checks upfront and doing attach as last step that cannot be undone? P.S. I guess one bit that I might have simplified is that synchronize_rcu() + bpf_prog_put(), but I'm not sure exactly why we put prog after sync_rcu. But if it's really necessary (and I assume it is) and is a blocker for the proposal above, then maybe the interface should delegate that to the caller (i.e., optionally return replaced prog pointer from attach/detach) or use call_rcu() with callback?