On Mon, Jan 30, 2023 at 07:01:15PM +0100, Florian Westphal wrote: > Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > > Is BPF_LINK the right place? Hook gets removed automatically if the calling program > > > exits, afaict this is intended. Yes. bpf_link is the right model. I'd also allow more than one BPF_NETFILTER prog at the hook. When Daniel respins his tc bpf_link set there will be a way to do that for tc and hopefully soon for xdp. For netfilter hook we can use the same approach. > > > > Yes, this is indeed intended for bpf_link. This plays well with > > applications that use the API and stick around (because things get > > cleaned up after them automatically even if they crash, say), but it > > doesn't work so well for programs that don't (which, notably, includes > > command line utilities like 'nft'). > > Right, but I did not want to create a dependency on nfnetlink or > nftables netlink right from the start. > > > For XDP and TC users can choose between bpf_link and netlink for > > attachment and get one of the two semantics (goes away on close or stays > > put). Not sure if it would make sense to do the same for nftables? > > For nftables I suspect that, if nft can emit bpf, it would make sense to > pass the bpf descriptor together with nftables netlink, i.e. along with > the normal netlink data. > > nftables kernel side would then know to use the bpf prog for the > datapath instead of the interpreter and could even fallback to > interpreter. > > But for the raw hook use case that Alexei and Daniel preferred (cf. > initial proposal to call bpf progs from nf_tables interpreter) I think > that there should be no nftables dependency. > > I could add a new nfnetlink subtype for nf-bpf if bpf_link is not > appropriate as an alternative. Let's start with bpf_link and figure out netlink path when appropriate. > > > Should a program running in init_netns be allowed to attach hooks in other netns too? > > > > > > I could do what BPF_LINK_TYPE_NETNS is doing and fetch net via > > > get_net_ns_by_fd(attr->link_create.target_fd); > > > > We don't allow that for any other type of BPF program; the expectation > > is that the entity doing the attachment will move to the right ns first. > > Is there any particular use case for doing something different for > > nftables? > > Nope, not at all. I was just curious. So no special handling for > init_netns needed, good. > > > > For the actual BPF_NETFILTER program type I plan to follow what the bpf > > > flow dissector is doing, i.e. pretend prototype is > > > > > > func(struct __sk_buff *skb) > > > > > > but pass a custom program specific context struct on kernel side. > > > Verifier will rewrite accesses as needed. > > > > This sounds reasonable, and also promotes code reuse between program > > types (say, you can write some BPF code to parse a packet and reuse it > > between the flow dissector, TC and netfilter). > > Ok, thanks. > > > > Things like nf_hook_state->in (net_device) could then be exposed via > > > kfuncs. > > > > Right, so like: > > > > state = bpf_nf_get_hook_state(ctx); ? > > > > Sounds OK to me. > > Yes, something like that. Downside is that the nf_hook_state struct > is no longer bound by any abi contract, but as I understood it thats > fine. I'd steer clear from new abi-s. Don't look at uapi __sk_buff model. It's not a great example to follow. Just pass kernel nf_hook_state into bpf prog and let program deal with changes to it via CORE. The prog will get a defition of 'struct nf_hook_state' from vmlinux.h or via private 'struct nf_hook_state___flavor' with few fields defined that prog wants to use. CORE will deal with offset adjustments. That's a lot less kernel code. No need for asm style ctx rewrites. Just see how much kernel code we already burned on *convert_ctx_access(). We cannot remove this tech debt due to uapi. When you pass struct nf_hook_state directly none of it is needed.