On 2023/1/9 21:58, Mark Rutland wrote: > Architectures without dynamic ftrace trampolines incur an overhead when > multiple ftrace_ops are enabled with distinct filters. in these cases, > each call site calls a common trampoline which uses > ftrace_ops_list_func() to iterate over all enabled ftrace functions, and > so incurs an overhead relative to the size of this list (including RCU > protection overhead). > > Architectures with dynamic ftrace trampolines avoid this overhead for > call sites which have a single associated ftrace_ops. In these cases, > the dynamic trampoline is customized to branch directly to the relevant > ftrace function, avoiding the list overhead. > > On some architectures it's impractical and/or undesirable to implement > dynamic ftrace trampolines. For example, arm64 has limited branch ranges > and cannot always directly branch from a call site to an arbitrary > address (e.g. from a kernel text address to an arbitrary module > address). Calls from modules to core kernel text can be indirected via > PLTs (allocated at module load time) to address this, but the same is > not possible from calls from core kernel text. > > Using an indirect branch from a call site to an arbitrary trampoline is > possible, but requires several more instructions in the function > prologue (or immediately before it), and/or comes with far more complex > requirements for patching. > > Instead, this patch adds a new option, where an architecture can > associate each call site with a pointer to an ftrace_ops, placed at a > fixed offset from the call site. A shared trampoline can recover this > pointer and call ftrace_ops::func() without needing to go via > ftrace_ops_list_func(), avoiding the associated overhead. > > This avoids issues with branch range limitations, and avoids the need to > allocate and manipulate dynamic trampolines, making it far simpler to > implement and maintain, while having similar performance > characteristics. > > Note that this allows for dynamic ftrace_ops to be invoked directly from > an architecture's ftrace_caller trampoline, whereas existing code forces > the use of ftrace_ops_get_list_func(), which is in part necessary to > permit the ftrace_ops to be freed once unregistereed *and* to avoid > branch/address-generation range limitation on some architectures (e.g. > where ops->func is a module address, and may be outside of the direct > branch range for callsites within the main kernel image). > > The CALL_OPS approach avoids this problems and is safe as: > > * The existing synchronization in ftrace_shutdown() using > ftrace_shutdown() using synchronize_rcu_tasks_rude() (and > synchronize_rcu_tasks()) ensures that no tasks hold a stale reference > to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline, > or while invoking ftrace_ops::func), when that ftrace_ops is > unregistered. > > Arguably this could also be relied upon for the existing scheme, > permitting dynamic ftrace_ops to be invoked directly when ops->func is > in range, but this will require additional logic to handle branch > range limitations, and is not handled by this patch. > > * Each callsite's ftrace_ops pointer literal can hold any valid kernel > address, and is updated atomically. As an architecture's ftrace_caller > trampoline will atomically load the ops pointer then derefrence > ops->func, there is no risk of invoking ops->func with a mismatches > ops pointer, and updates to the ops pointer do not require special > care. > > A subsequent patch will implement architectures support for arm64. There > should be no functional change as a result of this patch alone. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Florent Revest <revest@xxxxxxxxxxxx> > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > --- > include/linux/ftrace.h | 15 +++++- > kernel/trace/Kconfig | 7 +++ > kernel/trace/ftrace.c | 109 +++++++++++++++++++++++++++++++++++++++-- > 3 files changed, 124 insertions(+), 7 deletions(-) > > diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h > index 99f1146614c04..5eeb2776124c5 100644 > --- a/include/linux/ftrace.h > +++ b/include/linux/ftrace.h > @@ -57,6 +57,9 @@ void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip); > void arch_ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, > struct ftrace_ops *op, struct ftrace_regs *fregs); > #endif > +extern const struct ftrace_ops ftrace_nop_ops; > +extern const struct ftrace_ops ftrace_list_ops; > +struct ftrace_ops *ftrace_find_unique_ops(struct dyn_ftrace *rec); Hi Mark, This patch has build issues on x86: CC mm/readahead.o In file included from include/linux/perf_event.h:52:0, from arch/x86/events/amd/lbr.c:2: include/linux/ftrace.h:62:50: error: ‘struct dyn_ftrace’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror] struct ftrace_ops *ftrace_find_unique_ops(struct dyn_ftrace *rec); Here we should need 'struct dyn_ftrace' forward declaration. Thanks, Huafei > #endif /* CONFIG_FUNCTION_TRACER */ > > /* Main tracing buffer and events set up */ > @@ -563,6 +566,8 @@ bool is_ftrace_trampoline(unsigned long addr); > * IPMODIFY - the record allows for the IP address to be changed. > * DISABLED - the record is not ready to be touched yet > * DIRECT - there is a direct function to call > + * CALL_OPS - the record can use callsite-specific ops > + * CALL_OPS_EN - the function is set up to use callsite-specific ops > * > * When a new ftrace_ops is registered and wants a function to save > * pt_regs, the rec->flags REGS is set. When the function has been > @@ -580,9 +585,11 @@ enum { > FTRACE_FL_DISABLED = (1UL << 25), > FTRACE_FL_DIRECT = (1UL << 24), > FTRACE_FL_DIRECT_EN = (1UL << 23), > + FTRACE_FL_CALL_OPS = (1UL << 22), > + FTRACE_FL_CALL_OPS_EN = (1UL << 21), > }; > > -#define FTRACE_REF_MAX_SHIFT 23 > +#define FTRACE_REF_MAX_SHIFT 21 > #define FTRACE_REF_MAX ((1UL << FTRACE_REF_MAX_SHIFT) - 1) > > #define ftrace_rec_count(rec) ((rec)->flags & FTRACE_REF_MAX) > @@ -820,7 +827,8 @@ static inline int ftrace_init_nop(struct module *mod, struct dyn_ftrace *rec) > */ > extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr); > > -#ifdef CONFIG_DYNAMIC_FTRACE_WITH_REGS > +#if defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS) || \ > + defined(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS) > /** > * ftrace_modify_call - convert from one addr to another (no nop) > * @rec: the call site record (e.g. mcount/fentry) > @@ -833,6 +841,9 @@ extern int ftrace_make_call(struct dyn_ftrace *rec, unsigned long addr); > * what we expect it to be, and then on success of the compare, > * it should write to the location. > * > + * When using call ops, this is called when the associated ops change, even > + * when (addr == old_addr). > + * > * The code segment at @rec->ip should be a caller to @old_addr > * > * Return must be: > diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig > index 197545241ab83..5df427a2321df 100644 > --- a/kernel/trace/Kconfig > +++ b/kernel/trace/Kconfig > @@ -42,6 +42,9 @@ config HAVE_DYNAMIC_FTRACE_WITH_REGS > config HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS > bool > > +config HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS > + bool > + > config HAVE_DYNAMIC_FTRACE_WITH_ARGS > bool > help > @@ -257,6 +260,10 @@ config DYNAMIC_FTRACE_WITH_DIRECT_CALLS > depends on DYNAMIC_FTRACE_WITH_REGS > depends on HAVE_DYNAMIC_FTRACE_WITH_DIRECT_CALLS > > +config DYNAMIC_FTRACE_WITH_CALL_OPS > + def_bool y > + depends on HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS > + > config DYNAMIC_FTRACE_WITH_ARGS > def_bool y > depends on DYNAMIC_FTRACE > diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c > index 442438b93fe98..c0b219ab89d2f 100644 > --- a/kernel/trace/ftrace.c > +++ b/kernel/trace/ftrace.c > @@ -125,6 +125,33 @@ struct ftrace_ops global_ops; > void ftrace_ops_list_func(unsigned long ip, unsigned long parent_ip, > struct ftrace_ops *op, struct ftrace_regs *fregs); > > +#ifdef CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS > +/* > + * Stub used to invoke the list ops without requiring a separate trampoline. > + */ > +const struct ftrace_ops ftrace_list_ops = { > + .func = ftrace_ops_list_func, > + .flags = FTRACE_OPS_FL_STUB, > +}; > + > +static void ftrace_ops_nop_func(unsigned long ip, unsigned long parent_ip, > + struct ftrace_ops *op, > + struct ftrace_regs *fregs) > +{ > + /* do nothing */ > +} > + > +/* > + * Stub used when a call site is disabled. May be called transiently by threads > + * which have made it into ftrace_caller but haven't yet recovered the ops at > + * the point the call site is disabled. > + */ > +const struct ftrace_ops ftrace_nop_ops = { > + .func = ftrace_ops_nop_func, > + .flags = FTRACE_OPS_FL_STUB, > +}; > +#endif > + > static inline void ftrace_ops_init(struct ftrace_ops *ops) > { > #ifdef CONFIG_DYNAMIC_FTRACE > @@ -1814,6 +1841,18 @@ static bool __ftrace_hash_rec_update(struct ftrace_ops *ops, > * if rec count is zero. > */ > } > + > + /* > + * If the rec has a single associated ops, and ops->func can be > + * called directly, allow the call site to call via the ops. > + */ > + if (IS_ENABLED(CONFIG_DYNAMIC_FTRACE_WITH_CALL_OPS) && > + ftrace_rec_count(rec) == 1 && > + ftrace_ops_get_func(ops) == ops->func) > + rec->flags |= FTRACE_FL_CALL_OPS; > + else > + rec->flags &= ~FTRACE_FL_CALL_OPS; > + > count++; > > /* Must match FTRACE_UPDATE_CALLS in ftrace_modify_all_code() */ > @@ -2108,8 +2147,9 @@ void ftrace_bug(int failed, struct dyn_ftrace *rec) > struct ftrace_ops *ops = NULL; > > pr_info("ftrace record flags: %lx\n", rec->flags); > - pr_cont(" (%ld)%s", ftrace_rec_count(rec), > - rec->flags & FTRACE_FL_REGS ? " R" : " "); > + pr_cont(" (%ld)%s%s", ftrace_rec_count(rec), > + rec->flags & FTRACE_FL_REGS ? " R" : " ", > + rec->flags & FTRACE_FL_CALL_OPS ? " O" : " "); > if (rec->flags & FTRACE_FL_TRAMP_EN) { > ops = ftrace_find_tramp_ops_any(rec); > if (ops) { > @@ -2177,6 +2217,7 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) > * want the direct enabled (it will be done via the > * direct helper). But if DIRECT_EN is set, and > * the count is not one, we need to clear it. > + * > */ > if (ftrace_rec_count(rec) == 1) { > if (!(rec->flags & FTRACE_FL_DIRECT) != > @@ -2185,6 +2226,19 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) > } else if (rec->flags & FTRACE_FL_DIRECT_EN) { > flag |= FTRACE_FL_DIRECT; > } > + > + /* > + * Ops calls are special, as count matters. > + * As with direct calls, they must only be enabled when count > + * is one, otherwise they'll be handled via the list ops. > + */ > + if (ftrace_rec_count(rec) == 1) { > + if (!(rec->flags & FTRACE_FL_CALL_OPS) != > + !(rec->flags & FTRACE_FL_CALL_OPS_EN)) > + flag |= FTRACE_FL_CALL_OPS; > + } else if (rec->flags & FTRACE_FL_CALL_OPS_EN) { > + flag |= FTRACE_FL_CALL_OPS; > + } > } > > /* If the state of this record hasn't changed, then do nothing */ > @@ -2229,6 +2283,21 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) > rec->flags &= ~FTRACE_FL_DIRECT_EN; > } > } > + > + if (flag & FTRACE_FL_CALL_OPS) { > + if (ftrace_rec_count(rec) == 1) { > + if (rec->flags & FTRACE_FL_CALL_OPS) > + rec->flags |= FTRACE_FL_CALL_OPS_EN; > + else > + rec->flags &= ~FTRACE_FL_CALL_OPS_EN; > + } else { > + /* > + * Can only call directly if there's > + * only one sets of associated ops. > + */ > + rec->flags &= ~FTRACE_FL_CALL_OPS_EN; > + } > + } > } > > /* > @@ -2258,7 +2327,8 @@ static int ftrace_check_record(struct dyn_ftrace *rec, bool enable, bool update) > * and REGS states. The _EN flags must be disabled though. > */ > rec->flags &= ~(FTRACE_FL_ENABLED | FTRACE_FL_TRAMP_EN | > - FTRACE_FL_REGS_EN | FTRACE_FL_DIRECT_EN); > + FTRACE_FL_REGS_EN | FTRACE_FL_DIRECT_EN | > + FTRACE_FL_CALL_OPS_EN); > } > > ftrace_bug_type = FTRACE_BUG_NOP; > @@ -2431,6 +2501,25 @@ ftrace_find_tramp_ops_new(struct dyn_ftrace *rec) > return NULL; > } > > +struct ftrace_ops * > +ftrace_find_unique_ops(struct dyn_ftrace *rec) > +{ > + struct ftrace_ops *op, *found = NULL; > + unsigned long ip = rec->ip; > + > + do_for_each_ftrace_op(op, ftrace_ops_list) { > + > + if (hash_contains_ip(ip, op->func_hash)) { > + if (found) > + return NULL; > + found = op; > + } > + > + } while_for_each_ftrace_op(op); > + > + return found; > +} > + > #ifdef CONFIG_DYNAMIC_FTRACE_WITH_DIRECT_CALLS > /* Protected by rcu_tasks for reading, and direct_mutex for writing */ > static struct ftrace_hash *direct_functions = EMPTY_HASH; > @@ -3780,11 +3869,12 @@ static int t_show(struct seq_file *m, void *v) > if (iter->flags & FTRACE_ITER_ENABLED) { > struct ftrace_ops *ops; > > - seq_printf(m, " (%ld)%s%s%s", > + seq_printf(m, " (%ld)%s%s%s%s", > ftrace_rec_count(rec), > rec->flags & FTRACE_FL_REGS ? " R" : " ", > rec->flags & FTRACE_FL_IPMODIFY ? " I" : " ", > - rec->flags & FTRACE_FL_DIRECT ? " D" : " "); > + rec->flags & FTRACE_FL_DIRECT ? " D" : " ", > + rec->flags & FTRACE_FL_CALL_OPS ? " O" : " "); > if (rec->flags & FTRACE_FL_TRAMP_EN) { > ops = ftrace_find_tramp_ops_any(rec); > if (ops) { > @@ -3800,6 +3890,15 @@ static int t_show(struct seq_file *m, void *v) > } else { > add_trampoline_func(m, NULL, rec); > } > + if (rec->flags & FTRACE_FL_CALL_OPS_EN) { > + ops = ftrace_find_unique_ops(rec); > + if (ops) { > + seq_printf(m, "\tops: %pS (%pS)", > + ops, ops->func); > + } else { > + seq_puts(m, "\tops: ERROR!"); > + } > + } > if (rec->flags & FTRACE_FL_DIRECT) { > unsigned long direct; > >