Re: [RFC] bpf: add bpf_link support for BPF_NETFILTER programs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Florian Westphal <fw@xxxxxxxxx> writes:

> Doesn't apply, doesn't work -- there is no BPF_NETFILTER program type.
>
> Sketches the uapi.  Example usage:
>
> 	union bpf_attr attr = { };
>
> 	attr.link_create.prog_fd = progfd;
> 	attr.link_create.attach_type = BPF_NETFILTER;
> 	attr.link_create.netfilter.pf = PF_INET;
> 	attr.link_create.netfilter.hooknum = NF_INET_LOCAL_IN;
> 	attr.link_create.netfilter.priority = -128;
>
> 	err = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
>
> ... this would attach progfd to ipv4:input hook.
>
> Is BPF_LINK the right place?  Hook gets removed automatically if the calling program
> exits, afaict this is intended.

Yes, this is indeed intended for bpf_link. This plays well with
applications that use the API and stick around (because things get
cleaned up after them automatically even if they crash, say), but it
doesn't work so well for programs that don't (which, notably, includes
command line utilities like 'nft').

This is why I personally never really liked those semantics for
networking hooks: If I run a utility that attaches an XDP program I
generally expect that to stick around until the netdev disappears unless
something else explicitly removes it. (Yes you can pin a bpf_link, but
then you have the opposite problem: if the netdev disappears some entity
has to remove the pinned link, or you'll have a zombie program present
in the kernel until the next reboot).

For XDP and TC users can choose between bpf_link and netlink for
attachment and get one of the two semantics (goes away on close or stays
put). Not sure if it would make sense to do the same for nftables?

> Should a program running in init_netns be allowed to attach hooks in other netns too?
>
> I could do what BPF_LINK_TYPE_NETNS is doing and fetch net via
> get_net_ns_by_fd(attr->link_create.target_fd);

We don't allow that for any other type of BPF program; the expectation
is that the entity doing the attachment will move to the right ns first.
Is there any particular use case for doing something different for
nftables?

> For the actual BPF_NETFILTER program type I plan to follow what the bpf
> flow dissector is doing, i.e. pretend prototype is
>
> func(struct __sk_buff *skb)
>
> but pass a custom program specific context struct on kernel side.
> Verifier will rewrite accesses as needed.

This sounds reasonable, and also promotes code reuse between program
types (say, you can write some BPF code to parse a packet and reuse it
between the flow dissector, TC and netfilter).

> Things like nf_hook_state->in (net_device) could then be exposed via
> kfuncs.

Right, so like:

state = bpf_nf_get_hook_state(ctx); ?

Sounds OK to me.

> nf_hook_run_bpf() (c-function that creates the program context and
> calls the real bpf prog) would be "updated" to use the bpf dispatcher to
> avoid the indirect call overhead.

What 'bpf dispatcher' are you referring to here? We have way too many
things with that name :P

> Does that seem ok to you?  I'd ignore the bpf dispatcher for now and
> would work on the needed verifier changes first.

Getting something that works first seems reasonable, sure! :)

-Toke



[Index of Archives]     [Netfitler Users]     [Berkeley Packet Filter]     [LARTC]     [Bugtraq]     [Yosemite Forum]

  Powered by Linux