Florian Westphal <fw@xxxxxxxxx> writes: > Doesn't apply, doesn't work -- there is no BPF_NETFILTER program type. > > Sketches the uapi. Example usage: > > union bpf_attr attr = { }; > > attr.link_create.prog_fd = progfd; > attr.link_create.attach_type = BPF_NETFILTER; > attr.link_create.netfilter.pf = PF_INET; > attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN; > attr.link_create.netfilter.priority = -128; > > err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); > > ... this would attach progfd to ipv4:input hook. > > Is BPF_LINK the right place? Hook gets removed automatically if the calling program > exits, afaict this is intended. Yes, this is indeed intended for bpf_link. This plays well with applications that use the API and stick around (because things get cleaned up after them automatically even if they crash, say), but it doesn't work so well for programs that don't (which, notably, includes command line utilities like 'nft'). This is why I personally never really liked those semantics for networking hooks: If I run a utility that attaches an XDP program I generally expect that to stick around until the netdev disappears unless something else explicitly removes it. (Yes you can pin a bpf_link, but then you have the opposite problem: if the netdev disappears some entity has to remove the pinned link, or you'll have a zombie program present in the kernel until the next reboot). For XDP and TC users can choose between bpf_link and netlink for attachment and get one of the two semantics (goes away on close or stays put). Not sure if it would make sense to do the same for nftables? > Should a program running in init_netns be allowed to attach hooks in other netns too? > > I could do what BPF_LINK_TYPE_NETNS is doing and fetch net via > get_net_ns_by_fd(attr->link_create.target_fd); We don't allow that for any other type of BPF program; the expectation is that the entity doing the attachment will move to the right ns first. Is there any particular use case for doing something different for nftables? > For the actual BPF_NETFILTER program type I plan to follow what the bpf > flow dissector is doing, i.e. pretend prototype is > > func(struct __sk_buff *skb) > > but pass a custom program specific context struct on kernel side. > Verifier will rewrite accesses as needed. This sounds reasonable, and also promotes code reuse between program types (say, you can write some BPF code to parse a packet and reuse it between the flow dissector, TC and netfilter). > Things like nf_hook_state->in (net_device) could then be exposed via > kfuncs. Right, so like: state = bpf_nf_get_hook_state(ctx); ? Sounds OK to me. > nf_hook_run_bpf() (c-function that creates the program context and > calls the real bpf prog) would be "updated" to use the bpf dispatcher to > avoid the indirect call overhead. What 'bpf dispatcher' are you referring to here? We have way too many things with that name :P > Does that seem ok to you? I'd ignore the bpf dispatcher for now and > would work on the needed verifier changes first. Getting something that works first seems reasonable, sure! :) -Toke