Hi Andrii, On Wed, Oct 09, 2024 at 04:54:25PM -0700, Andrii Nakryiko wrote: > On Mon, Sep 9, 2024 at 12:21 AM Liao Chang <liaochang1@xxxxxxxxxx> wrote: > I'm curious what's the status of this patch? It received no comments > so far in the last month. Can someone on the ARM64 side of things > please take a look? (or maybe it was applied to some tree and there > was just no notification?) > > This is a very useful performance optimization for uprobe tracing on > ARM64, so would be nice to get it in during current release cycle. > Thank you! Sorry, I got busy chasing up a bunch of bugs and hadn't gotten round to this yet. I've replied with a couple of minor comments and an ack, and I reckon we can queue this up this cycle. Usually this sort of thing starts to get queued around -rc3. Mark. > > > diff --git a/arch/arm64/include/asm/insn.h b/arch/arm64/include/asm/insn.h > > index 8c0a36f72d6f..dd530d5c3d67 100644 > > --- a/arch/arm64/include/asm/insn.h > > +++ b/arch/arm64/include/asm/insn.h > > @@ -549,6 +549,12 @@ static __always_inline bool aarch64_insn_uses_literal(u32 insn) > > aarch64_insn_is_prfm_lit(insn); > > } > > > > +static __always_inline bool aarch64_insn_is_nop(u32 insn) > > +{ > > + return aarch64_insn_is_hint(insn) && > > + ((insn & 0xFE0) == AARCH64_INSN_HINT_NOP); > > +} > > + > > enum aarch64_insn_encoding_class aarch64_get_insn_class(u32 insn); > > u64 aarch64_insn_decode_immediate(enum aarch64_insn_imm_type type, u32 insn); > > u32 aarch64_insn_encode_immediate(enum aarch64_insn_imm_type type, > > diff --git a/arch/arm64/kernel/probes/decode-insn.c b/arch/arm64/kernel/probes/decode-insn.c > > index 968d5fffe233..be54539e309e 100644 > > --- a/arch/arm64/kernel/probes/decode-insn.c > > +++ b/arch/arm64/kernel/probes/decode-insn.c > > @@ -75,6 +75,15 @@ static bool __kprobes aarch64_insn_is_steppable(u32 insn) > > enum probe_insn __kprobes > > arm_probe_decode_insn(probe_opcode_t insn, struct arch_probe_insn *api) > > { > > + /* > > + * While 'nop' instruction can execute in the out-of-line slot, > > + * simulating them in breakpoint handling offers better performance. > > + */ > > + if (aarch64_insn_is_nop(insn)) { > > + api->handler = simulate_nop; > > + return INSN_GOOD_NO_SLOT; > > + } > > + > > /* > > * Instructions reading or modifying the PC won't work from the XOL > > * slot. > > diff --git a/arch/arm64/kernel/probes/simulate-insn.c b/arch/arm64/kernel/probes/simulate-insn.c > > index 22d0b3252476..5e4f887a074c 100644 > > --- a/arch/arm64/kernel/probes/simulate-insn.c > > +++ b/arch/arm64/kernel/probes/simulate-insn.c > > @@ -200,3 +200,14 @@ simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs) > > > > instruction_pointer_set(regs, instruction_pointer(regs) + 4); > > } > > + > > +void __kprobes > > +simulate_nop(u32 opcode, long addr, struct pt_regs *regs) > > +{ > > + /* > > + * Compared to instruction_pointer_set(), it offers better > > + * compatibility with single-stepping and execution in target > > + * guarded memory. > > + */ > > + arm64_skip_faulting_instruction(regs, AARCH64_INSN_SIZE); > > +} > > diff --git a/arch/arm64/kernel/probes/simulate-insn.h b/arch/arm64/kernel/probes/simulate-insn.h > > index e065dc92218e..efb2803ec943 100644 > > --- a/arch/arm64/kernel/probes/simulate-insn.h > > +++ b/arch/arm64/kernel/probes/simulate-insn.h > > @@ -16,5 +16,6 @@ void simulate_cbz_cbnz(u32 opcode, long addr, struct pt_regs *regs); > > void simulate_tbz_tbnz(u32 opcode, long addr, struct pt_regs *regs); > > void simulate_ldr_literal(u32 opcode, long addr, struct pt_regs *regs); > > void simulate_ldrsw_literal(u32 opcode, long addr, struct pt_regs *regs); > > +void simulate_nop(u32 opcode, long addr, struct pt_regs *regs); > > > > #endif /* _ARM_KERNEL_KPROBES_SIMULATE_INSN_H */ > > -- > > 2.34.1 > >