Re: [PATCH RFC bpf-next v1 2/4] bpf: Introduce load-acquire and store-release instructions

Xu Kuohai <xukuohai@xxxxxxxxxxxxxxx> · Mon, 30 Dec 2024 16:27:21 +0800

On 12/27/2024 7:07 AM, Peilin Ye wrote:
Hi Xu,

Thanks for reviewing this!

On Tue, Dec 24, 2024 at 06:07:14PM +0800, Xu Kuohai wrote:
On 12/21/2024 9:25 AM, Peilin Ye wrote:
+__AARCH64_INSN_FUNCS(load_acq,  0x3FC08000, 0x08C08000)
+__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)

I checked Arm Architecture Reference Manual [1].

Section C6.2.{168,169,170,371,372,373} state that field Rt2 (bits 10-14) and
Rs (bits 16-20) for LDARB/LDARH/LDAR/STLRB/STLRH and no offset type STLR
instructions are fixed to (1).

Section C2.2.2 explains that (1) means a Should-Be-One (SBO) bit.

And the Glossary section says "Arm strongly recommends that software writes
the field as all 1s. If software writes a value that is not all 1s, it must
expect an UNPREDICTABLE or CONSTRAINED UNPREDICTABLE result."

Although the pre-index type of STLR is an excetpion, it is not used in this
series. Therefore, both bits 10-14 and 16-20 in mask and value should be set
to 1s.

[1] https://developer.arm.com/documentation/ddi0487/latest/

<...>

+	insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
+					    AARCH64_INSN_REG_ZR);
+
+	return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
+					    AARCH64_INSN_REG_ZR);

As explained above, RS and RT2 fields should be fixed to 1s.

I'm already setting Rs and Rt2 to all 1's here, as AARCH64_INSN_REG_ZR
is defined as 31 (0b11111):

	AARCH64_INSN_REG_ZR = 31,

I see, but the setting of fixed bits is smomewhat of a waste of jit time.

Similar to how load- and store-exclusive instructions are handled
currently:

   __AARCH64_INSN_FUNCS(load_ex,	0x3F400000, 0x08400000)
   __AARCH64_INSN_FUNCS(store_ex,	0x3F400000, 0x08000000)

For example, in the manual, Rs is all (1)'s for LDXR{,B,H}, and Rt2 is
all (1)'s for both LDXR{,B,H} and STXR{,B,H}.  However, neither Rs nor
Rt2 bits are in the mask, and (1) bits are set manually, see
aarch64_insn_gen_load_store_ex():

   insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn,
                                       AARCH64_INSN_REG_ZR);

   return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn,
                                       state);

(For LDXR{,B,H}, 'state' is A64_ZR, which is just an alias to
AARCH64_INSN_REG_ZR (0b11111).)

- - -

On a related note, I simply grabbed {load,store}_ex's MASK and VALUE,
then set their 15th and 23rd bits to make them load-acquire and
store-release:

   +__AARCH64_INSN_FUNCS(load_acq,  0x3FC08000, 0x08C08000)
   +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000)
    __AARCH64_INSN_FUNCS(load_ex,   0x3F400000, 0x08400000)
    __AARCH64_INSN_FUNCS(store_ex,  0x3F400000, 0x08000000)

My question is, should we extend {load,store}_ex's MASK to make them
contain BIT(15) and BIT(23) as well?  As-is, aarch64_insn_is_load_ex()
would return true for a load-acquire.

The only user of aarch64_insn_is_load_ex() seems to be this
arm64-specific kprobe code in arch/arm64/kernel/probes/decode-insn.c:

   #ifdef CONFIG_KPROBES
   static bool __kprobes
   is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end)
   {
           while (scan_start >= scan_end) {
                   /*
                    * atomic region starts from exclusive load and ends with
                    * exclusive store.
                    */
                   if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start)))
                           return false;
                   else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start)))
                           return true;

But I'm not sure yet if changing {load,store}_ex's MASK would affect the
above code.  Do you happen to know the context?

IIUC, this code prevents kprobe from interrupting the LL-SC loop constructed
by LDXR/STXR pair, as the kprobe trap causes unexpected memory access that
prevents the exclusive memory access loop from exiting.

Since load-acquire/store-release instructions are not used to construct LL-SC
loop, I think it is safe to exclude them from {load,store}_ex.

+	if (BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD)
+		ptr = src;
+	else
+		ptr = dst;
+
+	if (off) {
+		emit_a64_mov_i(true, tmp, off, ctx);
+		emit(A64_ADD(true, tmp, tmp, ptr), ctx);

The mov and add instructions can be optimized to a single A64_ADD_I
if is_addsub_imm(off) is true.

Thanks!  I'll try this.

I think it's better to split the arm64 related changes into two separate
patches: one for adding the arm64 LDAR/STLR instruction encodings, and
the other for adding jit support.

Got it, in the next version I'll split this patch into (a) core/verifier
changes, (b) arm64 insn.{h,c} changes, and (c) arm64 JIT compiler
support.

Thanks,
Peilin Ye