Hi Alexei, [...] > I guess we never clearly defined what 'needs_zext' is supposed to be, > so it wouldn't be fair to call 32-bit JITs buggy. > But we better address this issue now. > This 32-bit zeroing after LDX hurts mips64, s390, ppc64, riscv64. > I believe all 4 JITs emit proper zero extension into 64-bit register > by using single cpu instruction, > but they also define bpf_jit_needs_zext() as true, > so extra BPF_ZEXT_REG() is added by the verifier > and it is a pure run-time overhead. I just realised that these zext instructions will not be a runtime overhead because the JITs ignore them. Like s390 does: case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */ case BPF_LDX | BPF_PROBE_MEM | BPF_B: /* llgc %dst,0(off,%src) */ EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, REG_0, off); jit->seen |= SEEN_MEM; if (insn_is_zext(&insn[1])) insn_count = 2; /* this will skip the next zext instruction */ break; powerpc does after LDX: if (size != BPF_DW && insn_is_zext(&insn[i + 1])) addrs[++i] = ctx->idx * 4; > It's better to remove > if (t != SRC_OP) > return BPF_SIZE(code) == BPF_DW; > from is_reg64() to avoid adding BPF_ZEXT_REG() insn > and fix 32-bit JITs at the same time. > RISCV32, PowerPC32, x86-32 JITs fixed in the first 3 patches > to always zero upper 32-bit after LDX and > then 4th patch to remove these two lines. I have sent the patches for above, although I think this optimization is useful because zero extension after LDX is only required when the loaded value is later being used as a 64-bit value. If it is not the case then the verifier will not emit the zext and 32-bit JITs will emit 1 less instruction because they expect the verifier to do the zext for them where required. Link to patch series: https://lore.kernel.org/bpf/20230912224654.6556-1-puranjay12@xxxxxxxxx/T/#t Thanks, Puranjay