On Fri, Aug 1, 2014 at 9:06 AM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > On Thu, Jul 31, 2014 at 02:02:19PM -0700, Alexei Starovoitov wrote: >> On Thu, Jul 31, 2014 at 12:40 PM, Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: >> > On Wed, Jul 30, 2014 at 08:34:16PM -0700, Alexei Starovoitov wrote: >> >> clean up names related to socket filtering and bpf in the following way: >> >> - everything that deals with sockets keeps 'sk_*' prefix >> >> - everything that is pure BPF is changed to 'bpf_*' prefix >> >> >> >> split 'struct sk_filter' into >> >> struct sk_filter { >> >> atomic_t refcnt; >> >> struct rcu_head rcu; >> >> struct bpf_prog *prog; >> >> }; >> > >> > I think you can use 'struct bpf_prog prog' instead so the entire >> > sk_filter remains in the same memory blob (as it is before this >> > patch). >> > >> > You can add an inline function to retrieve the bpg prog from the >> > filter: >> > >> > static inline struct bpf_prog *sk_filter_bpf(struct sk_filter *) >> > >> > and use it whenever possible to fetch the bpf program. I'm suggesting >> > this because we can use the zero array size in the socket filtering >> > abstraction later on, if the function above is used, this just needs >> > one line in that function to be updated to fetch the program from the >> > placeholder. >> >> correct. It would speed up SK_RUN_FILTER macro a little and I've >> considered it, but decided to go with the pointer for two reasons: >> 1.In sk_attach_filter() the bpf_prog is allocated, then reallocated >> as part of bpf_prepare_filter(). My patch #1 cleans up that part to >> avoid 'struct sock *' dependency, so all bpf_* functions work >> purely with bpf_prog... If bpf_prog is embedded inside sk_filter, >> bpf_prepare_filter would need to have a callback to reallocate >> the container struct and pass this callback through the chain >> of calls, which is ugly > > I think you can allocate the sk_filter once you get the final bpf > program, then you can memcpy() it. This adds some extra overhead in > the sk_attach_filter(), but that path is executed from user context > and it's also a rare operation (only once to attach the filter). It's > still not going to be a beauty, but IMO it's worth to focus on getting > that little speed up in the packet path at the cost of adding some > overhead on the socket attach path. memcpy of 'bpf_prog' is not just 'not a beauty', it won't work, since bpf_prog is freed via work_queue due to JIT. See bpf_jit_free() There are few other ways I could think of embedding bpf_prog inside sk_filter: 1. have a release callback, 2. extra field inside bpf_prog that will tell outer size of the container (so that all kfree(bpf_prog*) can adjust the pointer before freeing). 3. extra field which is direct pointer to the outer container, so that kfree(bpf_prog*) can use it instead of freeing the bpf_prog itself. In all cases it means hacking jit compilers for all architectures in a very ugly way. Say, we pick #3 (which is the least ugly), see how kfree(fp) will change to kfree(fp->pointer_to_outer_container). it will be not obvious at all to the reader that bpf_prog itself will be freed by such kfree()... so we'd need to add nasty comments everywhere... imo one extra load of fp->prog in critical path of SK_RUN_FILTER is not worth this mess. >> 2. having it as a pointer helps nft in the long run, since whole of >> bpf_prog doesn't stay around inside sk_filter when it's not used. > > If you put into place the inline wrapper function that I'm proposing > above and you use it instead of fp->bpf_prog, I think there should be > not interference: > > static inline struct bpf_prog *sk_filter_bpf(struct sk_filter *sk) > { > - return &sk->bpf_prog; > + return (struct bpf_prog *)sk->data; > } This is not pretty either, since we'd need to worry that alignment of sk->data field must match alignment of bpf_prog. Probably can make it 'long data[0]'... all of these just feels wrong to avoid single load in SK_RUN_FILTER(). -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html