Sorry for the late reply, but trying this out now - and have a question: On Thu, Jun 8, 2023 at 5:25 PM Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Thu, Jun 8, 2023 at 12:46 PM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote: > > > > Hi Daniel, > > > > On Thu, Jun 8, 2023 at 6:12 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > > > > > Hi Jamal, > > > > > > On 6/8/23 3:25 AM, Jamal Hadi Salim wrote: > > > [...] > > > > A general question (which i think i asked last time as well): who > > > > decides what comes after/before what prog in this setup? And would > > > > that same entity not have been able to make the same decision using tc > > > > priorities? > > > > > > Back in the first version of the series I initially coded up this option > > > that the tc_run() would basically be a fake 'bpf_prog' and it would have, > > > say, fixed prio 1000. It would get executed via tcx_run() when iterating > > > via bpf_mprog_foreach_prog() where bpf_prog_run() is called, and then users > > > could pick for native BPF prio before or after that. But then the feedback > > > was that sticking to prio is a bad user experience which led to the > > > development of what is in patch 1 of this series (see the details there). > > > > > > > Thanks. I read the commit message in patch 1 and followed the thread > > back including some of the discussion we had and i am still > > disagreeing that this couldnt be solved with a smart priority based > > scheme - but i think we can move on since this is standalone and > > doesnt affect tc. > > > > Daniel - i am still curious in the new scheme of things how would > > cilium vs datadog food fight get resolved without some arbitration > > entity? > > > > > > The idea of protecting programs from being unloaded is very welcome > > > > but feels would have made sense to be a separate patchset (we have > > > > good need for it). Would it be possible to use that feature in tc and > > > > xdp? > > > BPF links are supported for XDP today, just tc BPF is one of the few > > > remainders where it is not the case, hence the work of this series. What > > > XDP lacks today however is multi-prog support. With the bpf_mprog concept > > > that could be addressed with that common/uniform api (and Andrii expressed > > > interest in integrating this also for cgroup progs), so yes, various hook > > > points/program types could benefit from it. > > > > Is there some sample XDP related i could look at? Let me describe our > > use case: lets say we load an ebpf program foo attached to XDP of a > > netdev and then something further upstream in the stack is consuming > > the results of that ebpf XDP program. For some reason someone, at some > > point, decides to replace the XDP prog with a different one - and the > > new prog does a very different thing. Could we stop the replacement > > with the link mechanism you describe? i.e the program is still loaded > > but is no longer attached to the netdev. > > If you initially attached an XDP program using BPF link api > (LINK_CREATE command in bpf() syscall), then subsequent attachment to > the same interface (of a new link or program with BPF_PROG_ATTACH) > will fail until the current BPF link is detached through closing its > last fd. > So this works as advertised. The problem is however not totally solved because it seems we need a process that's alive to hold the ownership. If we had a daemon then that would solve it i think (we dont). Alternatively, you pin the link. The pinning part can be circumvented, unless i misunderstood i,e anybody with the right permissions can remove it. Am I missing something? cheers, jamal cheers, jamal > That is, until we allow multiple attachments of XDP programs to the > same network interface. But even then, no one will be able to > accidentally replace attached link, unless they have that link FD and > replace underlying BPF program. > > > > > > > > >> +struct tcx_entry { > > > >> + struct bpf_mprog_bundle bundle; > > > >> + struct mini_Qdisc __rcu *miniq; > > > >> +}; > > > >> + > > > > > > > > Can you please move miniq to the front? From where i sit this looks: > > > > struct tcx_entry { > > > > struct bpf_mprog_bundle bundle > > > > __attribute__((__aligned__(64))); /* 0 3264 */ > > > > > > > > /* XXX last struct has 36 bytes of padding */ > > > > > > > > /* --- cacheline 51 boundary (3264 bytes) --- */ > > > > struct mini_Qdisc * miniq; /* 3264 8 */ > > > > > > > > /* size: 3328, cachelines: 52, members: 2 */ > > > > /* padding: 56 */ > > > > /* paddings: 1, sum paddings: 36 */ > > > > /* forced alignments: 1 */ > > > > } __attribute__((__aligned__(64))); > > > > > > > > That is a _lot_ of cachelines - at the expense of the status quo > > > > clsact/ingress qdiscs which access miniq. > > > > > > Ah yes, I'll fix this up. > > > > Thanks. > > > > cheers, > > jamal > > > Thanks, > > > Daniel