On Mon, Dec 09, 2019 at 02:55:18PM +0100, Björn Töpel wrote: > + > +struct bpf_disp_prog { > + struct bpf_prog *prog; > + refcount_t users; > +}; > + > +struct bpf_dispatcher { > + void *func; > + struct bpf_disp_prog progs[BPF_DISPATCHER_MAX]; > + int num_progs; > + void *image; > + u32 image_off; > +}; > + > +static struct bpf_dispatcher *bpf_disp; > + > +static DEFINE_MUTEX(dispatcher_mutex); > + > +static struct bpf_dispatcher *bpf_dispatcher_lookup(void *func) > +{ > + struct bpf_dispatcher *d; > + void *image; > + > + if (bpf_disp) { > + if (bpf_disp->func != func) > + return NULL; > + return bpf_disp; > + } > + > + d = kzalloc(sizeof(*d), GFP_KERNEL); > + if (!d) > + return NULL; The bpf_dispatcher_lookup() above makes this dispatch logic a bit difficult to extend, since it works for only one bpf_disp and additional dispatchers would need hash table. Yet your numbers show that even with retpoline=off there is a performance benefit. So dispatcher probably can be reused almost as-is to accelerate sched_cls programs. What I was trying to say in my previous feedback on this subject is that lookup() doesn't need to exist. That 'void *func' doesn't need to be a function that dispatcher uses. It can be 'struct bpf_dispatcher *' instead. And lookup() becomes init(). Then bpf_dispatcher_change_prog() will be passing &bpf_dispatcher_xdp and bpf_dispatcher_xdp is defined via macro that supplies 'struct bpf_dispatcher' above and instantiated in particular .c file that used that macro. Then dispatcher can be used in more than one place. No need for hash table. Multiple dispatchers are instantiated in places that need them via macro. The code will look like: bpf_prog_change_xdp(struct bpf_prog *prev_prog, struct bpf_prog *prog) { bpf_dispatcher_change_prog(&bpf_dispatcher_xdp, prev_prog, prog); } Similarly sched_cls dispatcher for skb progs will do: bpf_dispatcher_change_prog(&bpf_dispatcher_tc, prev_prog, prog); wdyt?