On Fri, Apr 19, 2024 at 06:28:32PM +0000, Puranjay Mohan wrote: > The current implementation of the mov instruction with sign extension > has the following problems: > > 1. It clobbers the source register if it is not stacked because it > sign extends the source and then moves it to the destination. > 2. If the dst_reg is stacked, the current code doesn't write the value > back in case of 64-bit mov. > 3. There is room for improvement by emitting fewer instructions. > > The steps for fixing this and the instructions emitted by the JIT are > explained below with examples in all combinations: > > Case A: offset == 32: > ===================== > Case A.1: src and dst are stacked registers: > -------------------------------------------- > 1. Load src_lo into tmp_lo > 2. Store tmp_lo into dst_lo > 3. Sign extend tmp_lo into tmp_hi > 4. Store tmp_hi to dst_hi > > Example: r3 = (s32)r3 > r3 is a stacked register > > ldr r6, [r11, #-16] // Load r3_lo into tmp_lo > // str to dst_lo is not emitted because src_lo == dst_lo > asr r7, r6, #31 // Sign extend tmp_lo into tmp_hi > str r7, [r11, #-12] // Store tmp_hi into r3_hi > > Case A.2: src is stacked but dst is not: > ---------------------------------------- > 1. Load src_lo into dst_lo > 2. Sign extend dst_lo into dst_hi > > Example: r6 = (s32)r3 > r6 maps to {ARM_R5, ARM_R4} and r3 is stacked > > ldr r4, [r11, #-16] // Load r3_lo into r6_lo > asr r5, r4, #31 // Sign extend r6_lo into r6_hi > > Case A.3: src is not stacked but dst is stacked: > ------------------------------------------------ > 1. Store src_lo into dst_lo > 2. Sign extend src_lo into tmp_hi > 3. Store tmp_hi to dst_hi > > Example: r3 = (s32)r6 > r3 is stacked and r6 maps to {ARM_R5, ARM_R4} > > str r4, [r11, #-16] // Store r6_lo to r3_lo > asr r7, r4, #31 // Sign extend r6_lo into tmp_hi > str r7, [r11, #-12] // Store tmp_hi to dest_hi > > Case A.4: Both src and dst are not stacked: > ------------------------------------------- > 1. Mov src_lo into dst_lo > 2. Sign extend src_lo into dst_hi > > Example: (bf) r6 = (s32)r6 > r6 maps to {ARM_R5, ARM_R4} > > // Mov not emitted because dst == src > asr r5, r4, #31 // Sign extend r6_lo into r6_hi > > Case B: offset != 32: > ===================== > Case B.1: src and dst are stacked registers: > -------------------------------------------- > 1. Load src_lo into tmp_lo > 2. Sign extend tmp_lo according to offset. > 3. Store tmp_lo into dst_lo > 4. Sign extend tmp_lo into tmp_hi > 5. Store tmp_hi to dst_hi > > Example: r9 = (s8)r3 > r9 and r3 are both stacked registers > > ldr r6, [r11, #-16] // Load r3_lo into tmp_lo > lsl r6, r6, #24 // Sign extend tmp_lo > asr r6, r6, #24 // .. > str r6, [r11, #-56] // Store tmp_lo to r9_lo > asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi > str r7, [r11, #-52] // Store tmp_hi to r9_hi > > Case B.2: src is stacked but dst is not: > ---------------------------------------- > 1. Load src_lo into dst_lo > 2. Sign extend dst_lo according to offset. > 3. Sign extend tmp_lo into dst_hi > > Example: r6 = (s8)r3 > r6 maps to {ARM_R5, ARM_R4} and r3 is stacked > > ldr r4, [r11, #-16] // Load r3_lo to r6_lo > lsl r4, r4, #24 // Sign extend r6_lo > asr r4, r4, #24 // .. > asr r5, r4, #31 // Sign extend r6_lo into r6_hi > > Case B.3: src is not stacked but dst is stacked: > ------------------------------------------------ > 1. Sign extend src_lo into tmp_lo according to offset. > 2. Store tmp_lo into dst_lo. > 3. Sign extend src_lo into tmp_hi. > 4. Store tmp_hi to dst_hi. > > Example: r3 = (s8)r1 > r3 is stacked and r1 maps to {ARM_R3, ARM_R2} > > lsl r6, r2, #24 // Sign extend r1_lo to tmp_lo > asr r6, r6, #24 // .. > str r6, [r11, #-16] // Store tmp_lo to r3_lo > asr r7, r6, #31 // Sign extend tmp_lo to tmp_hi > str r7, [r11, #-12] // Store tmp_hi to r3_hi > > Case B.4: Both src and dst are not stacked: > ------------------------------------------- > 1. Sign extend src_lo into dst_lo according to offset. > 2. Sign extend dst_lo into dst_hi. > > Example: r6 = (s8)r1 > r6 maps to {ARM_R5, ARM_R4} and r1 maps to {ARM_R3, ARM_R2} > > lsl r4, r2, #24 // Sign extend r1_lo to r6_lo > asr r4, r4, #24 // .. > asr r5, r4, #31 // Sign extend r6_lo to r6_hi > > Fixes: fc832653fa0d ("arm32, bpf: add support for sign-extension mov instruction") > Reported-by: syzbot+186522670e6722692d86@xxxxxxxxxxxxxxxxxxxxxxxxx > Closes: https://lore.kernel.org/all/000000000000e9a8d80615163f2a@xxxxxxxxxx/ > Signed-off-by: Puranjay Mohan <puranjay@xxxxxxxxxx> This looks really great, thanks for putting in the extra effort! Reviewed-by: Russell King (Oracle) <rmk+kernel@xxxxxxxxxxxxxxx> Thanks! -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!