Re: [PATCH bpf-next v2 2/2] bpf: Add arm64 JIT support for bpf_addr_space_cast instruction.

Puranjay Mohan <puranjay12@xxxxxxxxx> · Sat, 23 Mar 2024 10:21:07 +0000

Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> writes:

> On 3/21/2024 11:31 PM, Puranjay Mohan wrote:
>> LLVM generates bpf_addr_space_cast instruction while translating
>> pointers between native (zero) address space and
>> __attribute__((address_space(N))). The addr_space=1 is reserved as
>> bpf_arena address space.
>> 
>> rY = addr_space_cast(rX, 0, 1) is processed by the verifier and
>> converted to normal 32-bit move: wX = wY
>> 
>> rY = addr_space_cast(rX, 1, 0) has to be converted by JIT:
>> 
>> Here I explain using symbolic language what the JIT is supposed to do:
>> We have:
>> 	src = [src_upper32][src_lower32] // 64 bit src kernel pointer
>> 	uvm = [uvm_upper32][uvm_lower32] // 64 bit user_vm_start
>> 
>> The JIT has to make the dst reg like following
>> 	dst = [uvm_upper32][src_lower32] // if src_lower32 != 0
>> 	dst = [00000000000][00000000000] // if src_lower32 == 0
>> 
>> Signed-off-by: Puranjay Mohan <puranjay12@xxxxxxxxx>
>> ---
>>   arch/arm64/net/bpf_jit.h                     |  1 +
>>   arch/arm64/net/bpf_jit_comp.c                | 35 ++++++++++++++++++++
>>   tools/testing/selftests/bpf/DENYLIST.aarch64 |  2 --
>>   3 files changed, 36 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
>> index 23b1b34db088..813c3c428fde 100644
>> --- a/arch/arm64/net/bpf_jit.h
>> +++ b/arch/arm64/net/bpf_jit.h
>> @@ -238,6 +238,7 @@
>>   #define A64_LSLV(sf, Rd, Rn, Rm) A64_DATA2(sf, Rd, Rn, Rm, LSLV)
>>   #define A64_LSRV(sf, Rd, Rn, Rm) A64_DATA2(sf, Rd, Rn, Rm, LSRV)
>>   #define A64_ASRV(sf, Rd, Rn, Rm) A64_DATA2(sf, Rd, Rn, Rm, ASRV)
>> +#define A64_RORV(sf, Rd, Rn, Rm) A64_DATA2(sf, Rd, Rn, Rm, RORV)
>>   
>>   /* Data-processing (3 source) */
>>   /* Rd = Ra + Rn * Rm */
>> diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
>> index b9b5febe64f0..37c94ebd06b2 100644
>> --- a/arch/arm64/net/bpf_jit_comp.c
>> +++ b/arch/arm64/net/bpf_jit_comp.c
>> @@ -82,6 +82,7 @@ struct jit_ctx {
>>   	__le32 *ro_image;
>>   	u32 stack_size;
>>   	int fpb_offset;
>> +	u64 user_vm_start;
>>   };
>>   
>>   struct bpf_plt {
>> @@ -868,6 +869,34 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx,
>>   	/* dst = src */
>>   	case BPF_ALU | BPF_MOV | BPF_X:
>
> is it legal to encode BPF_ADDR_SPACE_CAST with BPF_ALU?

No, the verifier will reject BPF_ALU MOV that has off=BPF_ADDR_SPACE_CAST.
So, a check is not required but I will add BPF_CLASS(code) == BPF_ALU64 below
in the next version.

>>   	case BPF_ALU64 | BPF_MOV | BPF_X:
>> +		if (insn->off == BPF_ADDR_SPACE_CAST &&
>> +		    insn->imm == 1U << 16) {
>> +			/* Zero out tmp2 */
>> +			emit(A64_EOR(1, tmp2, tmp2, tmp2), ctx);
>> +
>> +			/* Move lo_32_bits(src) to dst */
>> +			if (dst != src)
>> +				emit(A64_MOV(0, dst, src), ctx);
>> +
>> +			/* Logical shift left by 32 bits */
>> +			emit(A64_LSL(1, dst, dst, 32), ctx);
>> +
>> +			/* Get upper 32 bits of user_vm_start in tmp */
>> +			emit_a64_mov_i(0, tmp, ctx->user_vm_start >> 32, ctx);
>> +
>> +			/* dst |= up_32_bits(user_vm_start) */
>> +			emit(A64_ORR(1, dst, dst, tmp), ctx);
>> +
>> +			/* Rotate by 32 bits to get final result */
>> +			emit_a64_mov_i(0, tmp, 32, ctx);
>> +			emit(A64_RORV(1, dst, dst, tmp), ctx);
>> +
>> +			/* If lo_32_bits(dst) == 0, set dst = tmp2(0) */
>> +			emit(A64_CBZ(0, dst, 2), ctx);
>> +			emit(A64_MOV(1, tmp2, dst), ctx);
>> +			emit(A64_MOV(1, dst, tmp2), ctx);
>
> seems we could simplify it to:
>
> emit_a64_mov_i(0, dst, ctx->user_vm_start >> 32, ctx);
> emit(A64_LSL(1, dst, dst, 32), ctx);
> emit(A64_MOV(0, tmp, src), ctx); // 32-bit mov clears the upper 32 bits
> emit(A64_CBZ(1, tmp, 2), ctx);
> emit(A64_ORR(1, tmp, dst, tmp), ctx);
> emit(A64_MOV(1, dst, tmp), ctx);

Thanks, I will use this in the next version. I will move the
emit(A64_MOV(0, tmp, src), ctx); to the top so if the src and dst are same
then src will be moved to tmp before it is overwritten through dst:

emit(A64_MOV(0, tmp, src), ctx); // 32-bit mov clears the upper 32 bits
emit_a64_mov_i(0, dst, ctx->user_vm_start >> 32, ctx);
emit(A64_LSL(1, dst, dst, 32), ctx);
emit(A64_CBZ(1, tmp, 2), ctx);
emit(A64_ORR(1, tmp, dst, tmp), ctx);
emit(A64_MOV(1, dst, tmp), ctx);

>> +		break;
>
> not aligned

Will fix it in the next version.

Thanks for the feedback.

Puranjay