On Wed, Sep 13, 2023 at 2:09 AM Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> wrote: > > On Tue, Sep 12, 2023 at 3:49 PM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote: > > > > Hi Alexei, > > > > [...] > > > > > I guess we never clearly defined what 'needs_zext' is supposed to be, > > > so it wouldn't be fair to call 32-bit JITs buggy. > > > But we better address this issue now. > > > This 32-bit zeroing after LDX hurts mips64, s390, ppc64, riscv64. > > > I believe all 4 JITs emit proper zero extension into 64-bit register > > > by using single cpu instruction, > > > but they also define bpf_jit_needs_zext() as true, > > > so extra BPF_ZEXT_REG() is added by the verifier > > > and it is a pure run-time overhead. > > > > I just realised that these zext instructions will not be a runtime > > overhead because the JITs ignore them. > > Like > > s390 does: > > case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */ > > case BPF_LDX | BPF_PROBE_MEM | BPF_B: > > /* llgc %dst,0(off,%src) */ > > EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, REG_0, off); > > jit->seen |= SEEN_MEM; > > if (insn_is_zext(&insn[1])) > > insn_count = 2; /* this will skip the next zext instruction */ > > break; > > > > powerpc does after LDX: > > if (size != BPF_DW && insn_is_zext(&insn[i + 1])) > > addrs[++i] = ctx->idx * 4; > > > I see. Indeed the 64-bit JITs ignore this special zext insn after LDX. > > > > It's better to remove > > > if (t != SRC_OP) > > > return BPF_SIZE(code) == BPF_DW; > > > from is_reg64() to avoid adding BPF_ZEXT_REG() insn > > > and fix 32-bit JITs at the same time. > > > RISCV32, PowerPC32, x86-32 JITs fixed in the first 3 patches > > > to always zero upper 32-bit after LDX and > > > then 4th patch to remove these two lines. > > > > I have sent the patches for above, although I think this optimization > > is useful because > > zero extension after LDX is only required when the loaded value is > > later being used as > > a 64-bit value. If it is not the case then the verifier will not emit > > the zext and 32-bit JITs will emit > > 1 less instruction because they expect the verifier to do the zext for > > them where required. > > You're correct. > Ok. Let's keep zext for LDX as-is. Yes, let's do if (class == BPF_LDX) { if (t != SRC_OP) - return BPF_SIZE(code) == BPF_DW; + return (BPF_SIZE(code) == BPF_DW || BPF_MODE(code) == BPF_MEMSX); Thanks, Puranjay