On Fri, 15 Nov 2019 at 01:30, Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > [...] > > Could you try optimizing emit_mov_imm64() to recognize s32 ? > iirc there was a single x86 insns that could move and sign extend. > That should cut down on bytecode size and probably make things a bit faster? > Another alternative is compare lower 32-bit only, since on x86-64 upper 32 > should be ~0 anyway for bpf prog pointers. Good ideas, thanks! I'll do the optimization, extend it to >4 entries (as Toke suggested), and do a non-RFC respin. > Looking at bookkeeping code, I think I should be able to generalize bpf > trampoline a bit and share the code for bpf dispatch. Ok, good! > Could you also try aligning jmp target a bit by inserting nops? > Some x86 cpus are sensitive to jmp target alignment. Even without considering > JCC bug it could be helpful. Especially since we're talking about XDP/AF_XDP > here that will be pushing millions of calls through bpf dispatch. > Yeah, I need to address the Jcc bug anyway, so that makes sense. Another thought; I'm using the fentry nop as patch point, so it wont play nice with other users of fentry atm -- but the plan is to move to Steve's *_ftrace_direct work at some point, correct? Björn