On Fri, Nov 15, 2019 at 08:56:59AM +0100, Björn Töpel wrote: > On Fri, 15 Nov 2019 at 01:30, Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: > > > [...] > > > > Could you try optimizing emit_mov_imm64() to recognize s32 ? > > iirc there was a single x86 insns that could move and sign extend. > > That should cut down on bytecode size and probably make things a bit faster? > > Another alternative is compare lower 32-bit only, since on x86-64 upper 32 > > should be ~0 anyway for bpf prog pointers. > > Good ideas, thanks! I'll do the optimization, extend it to >4 entries > (as Toke suggested), and do a non-RFC respin. > > > Looking at bookkeeping code, I think I should be able to generalize bpf > > trampoline a bit and share the code for bpf dispatch. > > Ok, good! > > > Could you also try aligning jmp target a bit by inserting nops? > > Some x86 cpus are sensitive to jmp target alignment. Even without considering > > JCC bug it could be helpful. Especially since we're talking about XDP/AF_XDP > > here that will be pushing millions of calls through bpf dispatch. > > > > Yeah, I need to address the Jcc bug anyway, so that makes sense. > > Another thought; I'm using the fentry nop as patch point, so it wont > play nice with other users of fentry atm -- but the plan is to move to > Steve's *_ftrace_direct work at some point, correct? Yes. I'll start playing with reg/mod/unreg_ftrace_direct on Monday. Steven has a bunch more in his tree for merging, so I cannot just pull all of ftrace api features into bpf-next. So "be nice to other fentry users" would have to be done during merge window or shortly after in bpf-next tree after window closes. I think it's fine. In bpf dispatch case it's really one dummy function we're talking about. If it was marked 'notrace' from get go no one would blink. It's a dummy function not interesting for ftrac-ing and not interesting from live patching pov.