Hi Xu, Thanks for reviewing this! On Tue, Dec 24, 2024 at 06:07:14PM +0800, Xu Kuohai wrote: > On 12/21/2024 9:25 AM, Peilin Ye wrote: > > +__AARCH64_INSN_FUNCS(load_acq, 0x3FC08000, 0x08C08000) > > +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000) > > I checked Arm Architecture Reference Manual [1]. > > Section C6.2.{168,169,170,371,372,373} state that field Rt2 (bits 10-14) and > Rs (bits 16-20) for LDARB/LDARH/LDAR/STLRB/STLRH and no offset type STLR > instructions are fixed to (1). > > Section C2.2.2 explains that (1) means a Should-Be-One (SBO) bit. > > And the Glossary section says "Arm strongly recommends that software writes > the field as all 1s. If software writes a value that is not all 1s, it must > expect an UNPREDICTABLE or CONSTRAINED UNPREDICTABLE result." > > Although the pre-index type of STLR is an excetpion, it is not used in this > series. Therefore, both bits 10-14 and 16-20 in mask and value should be set > to 1s. > > [1] https://developer.arm.com/documentation/ddi0487/latest/ <...> > > + insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn, > > + AARCH64_INSN_REG_ZR); > > + > > + return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn, > > + AARCH64_INSN_REG_ZR); > > As explained above, RS and RT2 fields should be fixed to 1s. I'm already setting Rs and Rt2 to all 1's here, as AARCH64_INSN_REG_ZR is defined as 31 (0b11111): AARCH64_INSN_REG_ZR = 31, Similar to how load- and store-exclusive instructions are handled currently: > > __AARCH64_INSN_FUNCS(load_ex, 0x3F400000, 0x08400000) > > __AARCH64_INSN_FUNCS(store_ex, 0x3F400000, 0x08000000) For example, in the manual, Rs is all (1)'s for LDXR{,B,H}, and Rt2 is all (1)'s for both LDXR{,B,H} and STXR{,B,H}. However, neither Rs nor Rt2 bits are in the mask, and (1) bits are set manually, see aarch64_insn_gen_load_store_ex(): insn = aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RT2, insn, AARCH64_INSN_REG_ZR); return aarch64_insn_encode_register(AARCH64_INSN_REGTYPE_RS, insn, state); (For LDXR{,B,H}, 'state' is A64_ZR, which is just an alias to AARCH64_INSN_REG_ZR (0b11111).) - - - On a related note, I simply grabbed {load,store}_ex's MASK and VALUE, then set their 15th and 23rd bits to make them load-acquire and store-release: +__AARCH64_INSN_FUNCS(load_acq, 0x3FC08000, 0x08C08000) +__AARCH64_INSN_FUNCS(store_rel, 0x3FC08000, 0x08808000) __AARCH64_INSN_FUNCS(load_ex, 0x3F400000, 0x08400000) __AARCH64_INSN_FUNCS(store_ex, 0x3F400000, 0x08000000) My question is, should we extend {load,store}_ex's MASK to make them contain BIT(15) and BIT(23) as well? As-is, aarch64_insn_is_load_ex() would return true for a load-acquire. The only user of aarch64_insn_is_load_ex() seems to be this arm64-specific kprobe code in arch/arm64/kernel/probes/decode-insn.c: #ifdef CONFIG_KPROBES static bool __kprobes is_probed_address_atomic(kprobe_opcode_t *scan_start, kprobe_opcode_t *scan_end) { while (scan_start >= scan_end) { /* * atomic region starts from exclusive load and ends with * exclusive store. */ if (aarch64_insn_is_store_ex(le32_to_cpu(*scan_start))) return false; else if (aarch64_insn_is_load_ex(le32_to_cpu(*scan_start))) return true; But I'm not sure yet if changing {load,store}_ex's MASK would affect the above code. Do you happen to know the context? > > + if (BPF_ATOMIC_TYPE(insn->imm) == BPF_ATOMIC_LOAD) > > + ptr = src; > > + else > > + ptr = dst; > > + > > + if (off) { > > + emit_a64_mov_i(true, tmp, off, ctx); > > + emit(A64_ADD(true, tmp, tmp, ptr), ctx); > > The mov and add instructions can be optimized to a single A64_ADD_I > if is_addsub_imm(off) is true. Thanks! I'll try this. > I think it's better to split the arm64 related changes into two separate > patches: one for adding the arm64 LDAR/STLR instruction encodings, and > the other for adding jit support. Got it, in the next version I'll split this patch into (a) core/verifier changes, (b) arm64 insn.{h,c} changes, and (c) arm64 JIT compiler support. Thanks, Peilin Ye