On Fri, 13 Jan 2023 18:03:51 +0000 Mark Rutland <mark.rutland@xxxxxxx> wrote: > Architectures without dynamic ftrace trampolines incur an overhead when > multiple ftrace_ops are enabled with distinct filters. in these cases, > each call site calls a common trampoline which uses > ftrace_ops_list_func() to iterate over all enabled ftrace functions, and > so incurs an overhead relative to the size of this list (including RCU > protection overhead). > > Architectures with dynamic ftrace trampolines avoid this overhead for > call sites which have a single associated ftrace_ops. In these cases, > the dynamic trampoline is customized to branch directly to the relevant > ftrace function, avoiding the list overhead. > > On some architectures it's impractical and/or undesirable to implement > dynamic ftrace trampolines. For example, arm64 has limited branch ranges > and cannot always directly branch from a call site to an arbitrary > address (e.g. from a kernel text address to an arbitrary module > address). Calls from modules to core kernel text can be indirected via > PLTs (allocated at module load time) to address this, but the same is > not possible from calls from core kernel text. > > Using an indirect branch from a call site to an arbitrary trampoline is > possible, but requires several more instructions in the function > prologue (or immediately before it), and/or comes with far more complex > requirements for patching. > > Instead, this patch adds a new option, where an architecture can > associate each call site with a pointer to an ftrace_ops, placed at a > fixed offset from the call site. A shared trampoline can recover this > pointer and call ftrace_ops::func() without needing to go via > ftrace_ops_list_func(), avoiding the associated overhead. > > This avoids issues with branch range limitations, and avoids the need to > allocate and manipulate dynamic trampolines, making it far simpler to > implement and maintain, while having similar performance > characteristics. > > Note that this allows for dynamic ftrace_ops to be invoked directly from > an architecture's ftrace_caller trampoline, whereas existing code forces > the use of ftrace_ops_get_list_func(), which is in part necessary to > permit the ftrace_ops to be freed once unregistereed *and* to avoid > branch/address-generation range limitation on some architectures (e.g. > where ops->func is a module address, and may be outside of the direct > branch range for callsites within the main kernel image). > > The CALL_OPS approach avoids this problems and is safe as: > > * The existing synchronization in ftrace_shutdown() using > ftrace_shutdown() using synchronize_rcu_tasks_rude() (and > synchronize_rcu_tasks()) ensures that no tasks hold a stale reference > to an ftrace_ops (e.g. in the middle of the ftrace_caller trampoline, > or while invoking ftrace_ops::func), when that ftrace_ops is > unregistered. > > Arguably this could also be relied upon for the existing scheme, > permitting dynamic ftrace_ops to be invoked directly when ops->func is > in range, but this will require additional logic to handle branch > range limitations, and is not handled by this patch. > > * Each callsite's ftrace_ops pointer literal can hold any valid kernel > address, and is updated atomically. As an architecture's ftrace_caller > trampoline will atomically load the ops pointer then derefrence > ops->func, there is no risk of invoking ops->func with a mismatches > ops pointer, and updates to the ops pointer do not require special > care. > > A subsequent patch will implement architectures support for arm64. There > should be no functional change as a result of this patch alone. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Florent Revest <revest@xxxxxxxxxxxx> > Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx> > Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> > Cc: Steven Rostedt <rostedt@xxxxxxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > --- > Looks good. Looking through it, I don't see any issues. Although I didn't test it ;-) I probably should, but in the mean time (as my tests are currently broken)... Reviewed-by: Steven Rostedt (Google) <rostedt@xxxxxxxxxxx> -- Steve