Re: [PATCH bpf-next 01/11] bpf: Disable zero-extension for BPF_MEMSX

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Tue, 12 Sep 2023 17:09:44 -0700

On Tue, Sep 12, 2023 at 3:49 PM Puranjay Mohan <puranjay12@xxxxxxxxx> wrote:
>
> Hi Alexei,
>
> [...]
>
> > I guess we never clearly defined what 'needs_zext' is supposed to be,
> > so it wouldn't be fair to call 32-bit JITs buggy.
> > But we better address this issue now.
> > This 32-bit zeroing after LDX hurts mips64, s390, ppc64, riscv64.
> > I believe all 4 JITs emit proper zero extension into 64-bit register
> > by using single cpu instruction,
> > but they also define bpf_jit_needs_zext() as true,
> > so extra BPF_ZEXT_REG() is added by the verifier
> > and it is a pure run-time overhead.
>
> I just realised that these zext instructions will not be a runtime
> overhead because the JITs ignore them.
> Like
> s390 does:
> case BPF_LDX | BPF_MEM | BPF_B: /* dst = *(u8 *)(ul) (src + off) */
> case BPF_LDX | BPF_PROBE_MEM | BPF_B:
>         /* llgc %dst,0(off,%src) */
>         EMIT6_DISP_LH(0xe3000000, 0x0090, dst_reg, src_reg, REG_0, off);
>         jit->seen |= SEEN_MEM;
>         if (insn_is_zext(&insn[1]))
>                 insn_count = 2; /* this will skip the next zext instruction */
>         break;
>
> powerpc does after LDX:
> if (size != BPF_DW && insn_is_zext(&insn[i + 1]))
>         addrs[++i] = ctx->idx * 4;

I see. Indeed the 64-bit JITs ignore this special zext insn after LDX.

> > It's better to remove
> > if (t != SRC_OP)
> >     return BPF_SIZE(code) == BPF_DW;
> > from is_reg64() to avoid adding BPF_ZEXT_REG() insn
> > and fix 32-bit JITs at the same time.
> > RISCV32, PowerPC32, x86-32 JITs fixed in the first 3 patches
> > to always zero upper 32-bit after LDX and
> > then 4th patch to remove these two lines.
>
> I have sent the patches for above, although I think this optimization
> is useful because
> zero extension after LDX is only required when the loaded value is
> later being used as
> a 64-bit value. If it is not the case then the verifier will not emit
> the zext and 32-bit JITs will emit
> 1 less instruction because they expect the verifier to do the zext for
> them where required.

You're correct.
Ok. Let's keep zext for LDX as-is.