Re: [PATCH bpf] arm32, bpf: Fix sign-extension mov instruction

Puranjay Mohan <puranjay@xxxxxxxxxx> · Fri, 19 Apr 2024 18:52:31 +0000

"Russell King (Oracle)" <linux@xxxxxxxxxxxxxxx> writes:

> On Tue, Apr 09, 2024 at 09:50:38AM +0000, Puranjay Mohan wrote:
>> The current implementation of the mov instruction with sign extension
>> clobbers the source register because it sign extends the source and then
>> moves it to the destination.
>> 
>> Fix this by moving the src to a temporary register before doing the sign
>> extension only if src is not an emulated register (on the scratch stack).
>> 
>> Also fix the emit_a32_movsx_r64() to put the register back on scratch
>> stack if that register is emulated on stack.
>
> It would be good to include in the commit message an example or two of
> the resulting assembly code so that it's clear what the expected
> generation is. Instead, I'm going to have to work it out myself, but
> I'm quite sure this is information you already have.
>
>> Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction")
>> Reported-by: syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx
>> Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@xxxxxxxxxx/
>> Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx>
>> ---
>>  arch/arm/net/bpf_jit_32.c | 11 +++++++++--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
>> index 1d672457d02f..8fde6ab66cb4 100644
>> --- a/arch/arm/net/bpf_jit_32.c
>> +++ b/arch/arm/net/bpf_jit_32.c
>> @@ -878,6 +878,13 @@ static inline void emit_a32_mov_r(const s8 dst, const s8 src, const u8 off,
>>  
>>  	rt = arm_bpf_get_reg32(src, tmp[0], ctx);
>>  	if (off && off != 32) {
>> +		/* If rt is not a stacked register, move it to tmp, so it doesn't get clobbered by
>> +		 * the shift operations.
>> +		 */
>> +		if (rt == src) {
>> +			emit(ARM_MOV_R(tmp[0], rt), ctx);
>> +			rt = tmp[0];
>> +		}
>
> This change is adding inefficiency, don't we want to have the JIT
> creating as efficient code as possible within the bounds of
> reasonableness?
>
>>  		emit(ARM_LSL_I(rt, rt, 32 - off), ctx);
>>  		emit(ARM_ASR_I(rt, rt, 32 - off), ctx);
>
> LSL and ASR can very easily take a different source register to the
> destination register. All this needs to be is:
>
> 		emit(ARM_LSL_I(tmp[0], rt, 32 - off), ctx);
> 		emit(ARM_ASR_I(tmp[0], tmp[0], 32 - off), ctx);
> 		rt = tmp[0];
>
> This will generate:
>
> 		lsl	tmp[0], src, #32-off
> 		asr	tmp[0], tmp[0], #32-off
>
> and then the store to the output register will occur.
>
> What about the high-32 bits of the register pair - should that be
> taking any value?
>
>>  	}
>
> I notice in passing that the comments are out of sync with the
> code - please update the comments along with code changes.
>
>> @@ -919,15 +926,15 @@ static inline void emit_a32_movsx_r64(const bool is64, const u8 off, const s8 ds
>>  	const s8 *tmp = bpf2a32[TMP_REG_1];
>>  	const s8 *rt;
>>  
>> -	rt = arm_bpf_get_reg64(dst, tmp, ctx);
>> -
>>  	emit_a32_mov_r(dst_lo, src_lo, off, ctx);
>>  	if (!is64) {
>>  		if (!ctx->prog->aux->verifier_zext)
>>  			/* Zero out high 4 bytes */
>>  			emit_a32_mov_i(dst_hi, 0, ctx);
>>  	} else {
>> +		rt = arm_bpf_get_reg64(dst, tmp, ctx);
>>  		emit(ARM_ASR_I(rt[0], rt[1], 31), ctx);
>> +		arm_bpf_put_reg64(dst, rt, ctx);
>>  	}
>>  }
>
> Why oh why oh why are we emitting code to read the source register
> (which may be a load), then write it to the destination (which may
> be a store) to only then immediately reload from the destination
> to then do the sign extension? This is madness.
>
> Please... apply some thought to the code generation from the JIT...
> or I will remove you from being a maintainer of this code. I spent
> time crafting some parts of the JIT to generate efficient code and
> I'm seeing that a lot of that work is now being thrown away by
> someone who seemingly doesn't care about generating "good" code.
>

Sorry for this, I also like to make sure the JITs are as efficient as
possible. I was too focused on fixing this as fast as possible and
didn't pay attention that day.

I have reimplemented the whole thing again to make sure all bugs are
fixed. The commit message has the generated assembly for all cases:

https://lore.kernel.org/all/20240419182832.27707-1-puranjay@xxxxxxxxxx/

Thanks,
Puranjay