"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> writes: > On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote: >> The current implementation of the mov instruction with sign extension >> clobbers the source register because it sign extends the source and then >> moves it to the destination. >> >> Fix this by moving the src to a temporary register before doing the sign >> extension only if src is not an emulated register (on the scratch stack). >> >> Also fix the emit_a32_movsx_r64() to put the register back on scratch >> stack if that register is emulated on stack. > > It would be good to include in the commit message an example or two of > the resulting assembly code so that it's clear what the expected > generation is. Instead, I'm going to have to work it out myself, but > I'm quite sure this is information you already have. > >> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction") >> Reported-by: syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx >> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@xxxxxxxxxx/ >> Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx> >> --- >> arch/arm/net/bpf_jit_32.c | 11 +++++++++-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c >> index 1d672457d02f..8fde6ab66cb4 100644 >> --- a/arch/arm/net/bpf_jit_32.c >> +++ b/arch/arm/net/bpf_jit_32.c >> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off, >> >> rt = arm_bpf_get_reg32(src, tmp[0], ctx); >> if (off && off != 32) { >> + /* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by >> + * the shift operations. >> + */ >> + if (rt == src) { >> + emit(ARM_MOV_R(tmp[0], rt), ctx); >> + rt = tmp[0]; >> + } > > This change is adding inefficiency, don't we want to have the JIT > creating as efficient code as possible within the bounds of > reasonableness? > >> emit(ARM_LSL_I(rt, rt, 32 - off), ctx); >> emit(ARM_ASR_I(rt, rt, 32 - off), ctx); > > LSL and ASR can very easily take a different source register to the > destination register. All this needs to be is: > > emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx); > emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx); > rt = tmp[0]; > > This will generate: > > lsl tmp[0], src, #32-off > asr tmp[0], tmp[0], #32-off > > and then the store to the output register will occur. > > What about the high-32 bits of the register pair - should that be > taking any value? > >> } > > I notice in passing that the comments are out of sync with the > code - please update the comments along with code changes. > >> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds >> const s8 *tmp = bpf2a32[TMP_REG_1]; >> const s8 *rt; >> >> - rt = arm_bpf_get_reg64(dst, tmp, ctx); >> - >> emit_a32_mov_r(dst_lo, src_lo, off, ctx); >> if (!is64) { >> if (!ctx->prog->aux->verifier_zext) >> /* Zero out high 4 bytes */ >> emit_a32_mov_i(dst_hi, 0, ctx); >> } else { >> + rt = arm_bpf_get_reg64(dst, tmp, ctx); >> emit(ARM_ASR_I(rt[0], rt[1], 31), ctx); >> + arm_bpf_put_reg64(dst, rt, ctx); >> } >> } > > Why oh why oh why are we emitting code to read the source register > (which may be a load), then write it to the destination (which may > be a store) to only then immediately reload from the destination > to then do the sign extension? This is madness. > > Please... apply some thought to the code generation from the JIT... > or I will remove you from being a maintainer of this code. I spent > time crafting some parts of the JIT to generate efficient code and > I'm seeing that a lot of that work is now being thrown away by > someone who seemingly doesn't care about generating "good" code. > Sorry for this, I also like to make sure the JITs are as efficient as possible. I was too focused on fixing this as fast as possible and didn't pay attention that day. I have reimplemented the whole thing again to make sure all bugs are fixed. The commit message has the generated assembly for all cases: https://lore.kernel.org/all/20240419182832.27707-1-puranjay@xxxxxxxxxx/ Thanks, Puranjay