On Mon, Jul 20, 2020 at 4:35 PM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Sun, Jul 19, 2020 at 10:02:48PM -0700, Andrii Nakryiko wrote: > > On Thu, Jul 16, 2020 at 7:06 PM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > > > > > > On Thu, Jul 16, 2020 at 12:50:05PM +0200, Toke Høiland-Jørgensen wrote: > > > > Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> writes: > > > > > > > > > On Wed, Jul 15, 2020 at 03:09:02PM +0200, Toke Høiland-Jørgensen wrote: > > > > >> > > > > >> + if (tgt_prog_fd) { > > > > >> + /* For now we only allow new targets for BPF_PROG_TYPE_EXT */ > > > > >> + if (prog->type != BPF_PROG_TYPE_EXT || > > > > >> + !btf_id) { > > > > >> + err = -EINVAL; > > > > >> + goto out_put_prog; > > > > >> + } > > > > >> + tgt_prog = bpf_prog_get(tgt_prog_fd); > > > > >> + if (IS_ERR(tgt_prog)) { > > > > >> + err = PTR_ERR(tgt_prog); > > > > >> + tgt_prog = NULL; > > > > >> + goto out_put_prog; > > > > >> + } > > > > >> + > > > > >> + } else if (btf_id) { > > > > >> + err = -EINVAL; > > > > >> + goto out_put_prog; > > > > >> + } else { > > > > >> + btf_id = prog->aux->attach_btf_id; > > > > >> + tgt_prog = prog->aux->linked_prog; > > > > >> + if (tgt_prog) > > > > >> + bpf_prog_inc(tgt_prog); /* we call bpf_prog_put() on link release */ > > > > > > > > > > so the first prog_load cmd will beholding the first target prog? > > > > > This is complete non starter. > > > > > You didn't mention such decision anywhere. > > > > > The first ext prog will attach to the first dispatcher xdp prog, > > > > > then that ext prog will multi attach to second dispatcher xdp prog and > > > > > the first dispatcher prog will live in the kernel forever. > > > > > > > > Huh, yeah, you're right that's no good. Missing that was a think-o on my > > > > part, sorry about that :/ > > > > > > > > > That's not what we discussed back in April. > > > > > > > > No, you mentioned turning aux->linked_prog into a list. However once I > > > > started looking at it I figured it was better to actually have all this > > > > (the trampoline and ref) as part of the bpf_link structure, since > > > > logically they're related. > > > > > > > > But as you pointed out, the original reference sticks. So either that > > > > needs to be removed, or I need to go back to the 'aux->linked_progs as a > > > > list' idea. Any preference? > > > > > > Good question. Back then I was thinking about converting linked_prog into link > > > list, since standalone single linked_prog is quite odd, because attaching ext > > > prog to multiple tgt progs should have equivalent properties across all > > > attachments. > > > Back then bpf_link wasn't quite developed. > > > Now I feel moving into bpf_tracing_link is better. > > > I guess a link list of bpf_tracing_link-s from 'struct bpf_prog' might work. > > > At prog load time we can do bpf_link_init() only (without doing bpf_link_prime) > > > and keep this pre-populated bpf_link with target bpf prog and trampoline > > > in a link list accessed from 'struct bpf_prog'. > > > Then bpf_tracing_prog_attach() without extra tgt_prog_fd/btf_id would complete > > > that bpf_tracing_link by calling bpf_link_prime() and bpf_link_settle() > > > without allocating new one. > > > Something like: > > > struct bpf_tracing_link { > > > struct bpf_link link; /* ext prog pointer is hidding in there */ > > > enum bpf_attach_type attach_type; > > > struct bpf_trampoline *tr; > > > struct bpf_prog *tgt_prog; /* old aux->linked_prog */ > > > }; > > > > > > ext prog -> aux -> link list of above bpf_tracing_link-s > > > > > > It's a circular reference, obviously. > > > Need to think through the complications and locking. > > > > > > bpf_tracing_prog_attach() with tgt_prog_fd/btf_id will alloc new bpf_tracing_link > > > and will add it to a link list. > > > > > > Just a rough idea. I wonder what Andrii thinks. > > > > > > > I need to spend more time reading existing and new code to see all the > > details, but I'll throw a slightly different proposal and let you guys > > shoot it down. > > > > So, what if instead of having linked_prog (as bpf_prog *, refcnt'ed), > > at BPF_PROG_LOAD time we just record the target prog's ID. BPF > > verifier, when doing its target prog checks would attempt to get > > bpf_prog * reference; if by that time the target program is gone, > > fail, of course. If not, everything proceeds as is, at the end of > > verification target_prog is put until attach time. > > > > Then at attach time, we either go with pre-recorded (in > > prog->aux->linked_prog_id) target prog's ID or we get a new one from > > RAW_TP_OPEN tgt_prog_fd. Either way, we bump refcnt on that target > > prog and keep it with bpf_tracing_link (so link on detach would put > > target_prog, that way it doesn't go away while EXT prog is attached). > > Then do all the compatibility checks, and if everything works out, > > bpf_tracing_link gets created, we record trampoline there, etc, etc. > > Basically, instead of having an EXT prog holding a reference to the > > target prog, only attachment (bpf_link) does that, which conceptually > > also seems to make more sense to me. For verification we store prog ID > > and don't hold target prog at all. > > > > > > Now, there will be a problem once you attach EXT prog to a new XDP > > root program and release a link against the original XDP root program. > > First, I hope I understand the desired sequence right, here's an > > example: > > > > 1. load XDP root prog X > > 2. load EXT prog with target prog X > > 3. attach EXT prog to prog X > > 4. load XDP root prog Y > > 5. attach EXT prog to prog Y (Y and X should be "compatible") > > 6. detach prog X (close bpf_link) > > > > Is that the right sequence? > > > > If yes, then the problem with storing ID of prog X in EXT > > prog->aux->linked_prog_id is that you won't be able to re-attach to > > new prog Z, because there won't be anything to check compatibility > > against (prog X will be long time gone). > > > > So we can do two things here: > > > > 1. on attach, replace ext_prog->aux->linked_prog_id with the latest > > attached prog (prog Y ID from above example) > > 2. instead of recording target program FD/ID, capture BTF FD and/or > > enough BTF information for checking compatibility. > > > > Approach 2) seems like conceptually the right thing to do (record type > > info we care about, not an **instance** of BPF program, compatible > > with that type info), but technically might be harder. > > I've read your proposal couple times and still don't get what you're > trying to solve with either ID or BTF info recording. > So that target prog doesn't get refcnt-ed? What's a problem with it? > Currently it's being refcnt-d in aux->linked_prog. > What I'm proposing about is to convert aux->linked_prog into a link list > of bpf_tracing_links which will contain linked_prog inside. > Conceptually that's what bpf_link is doing. It links two progs. > EXT prog is recorded in 'struct bpf_link' and > the target prog is recorded in 'struct bpf_tracing_link'. > So from bpf_link perspective everything seems clean to me. > The link list of bpf_tracing_link-s in EXT_prog->aux is only to preserve > existing api of prog_load cmd. Right, I wanted to avoid taking a refcnt on aux->linked_prog during PROG_LOAD. The reason for that was (and still is) that I don't get who and when has to bpf_prog_put() original aux->linked_prog to allow the prog X to be freed. I.e., after you re-attach to prog Y, how prog X is released (assuming no active bpf_link is keeping it from being freed)? That's my biggest confusion right now. I also didn't like the idea of half-creating bpf_tracing_link on PROG_LOAD and then turning it into a real link with bpf_link_settle on attach. That sounded like a hack to me. But now I'm also confused why we need to turn aux->linked_prog into a list. Seems like we need it only for old-style attach that doesn't specify tgt_prog_fd, no? Only in that case we'll use aux->linked_prog. Otherwise we know the target prog from tgt_prog_fd. So I'll be honest that I don't get the whole idea of maintaining a list of bpf_tracing_links. It seems like it should be possible to make bpf_tracing_link decoupled from any prog's aux and have their own independent lifetime. > > As far as step 5: attach EXT prog to prog Y (Y and X should be "compatible") > The chance of failure there should be minimal. libxdp/libdispatcher will > prepare rootlet XDP prog. It should really make sure that Y and X are compatible. > This should be invisible to users. Right, of course, but the kernel needs to validate that anyways, which is why I pointed that out. Or are you saying we should just assume that they are valid? > > In addition we still need bpf_link_update_hook() I was talking about in April. > The full sequence is: > first user process: > 1. load XDP root prog X > 1' root_link = attach X to eth0 > 2. load EXT prog with target prog X > 3. app1_link_fd = attach EXT prog to prog X > second user process: > 4. load XDP root prog Y > 4'. find EXT prog of the first user process > 5. app2_link_fd = attach EXT prog to prog Y (Y and X should be "compatible") > 6. bpf_link_update(root_link, X, Y); // now packet flows into Y and into EXT > // while EXT is attached in two places > 7. app1_link_fd' = FD in second process that points to the same tracing link > as app1_link_fd in the first process. > bpf_link_update_hook(app1_link_fd', app2_link_fd) > the last operation need to update bpf_tracing_link that is held by app1 > (which is the first user process) from the second user process. It needs to > retarget (update_hook) inside bpf_tracing_link from X to Y. > Since the processes are more or less not aware of each other. > One firewall holds link_fd that connects EXT to X, > but the second firewall (via libxdp) is updaing that tracing link > to re-hook EXT into Y. Yeah, should be doable given that bpf_trampoline is independently refcounted.