Re: [RFC PATCH v3 2/3] tracing: Introduce tracepoint_is_syscall()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024-10-26 20:08, Steven Rostedt wrote:
On Sat, 26 Oct 2024 11:46:28 -0400
Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx> wrote:

Introduce a "syscall" flag within the extended structure to know whether
a tracepoint needs rcu tasks trace grace period before reclaim.
This can be queried using tracepoint_is_syscall().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@xxxxxxxxxxxx>
Cc: Michael Jeanson <mjeanson@xxxxxxxxxxxx>
Cc: Steven Rostedt <rostedt@xxxxxxxxxxx>
Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
Cc: Yonghong Song <yhs@xxxxxx>
Cc: Paul E. McKenney <paulmck@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
Cc: Mark Rutland <mark.rutland@xxxxxxx>
Cc: Alexander Shishkin <alexander.shishkin@xxxxxxxxxxxxxxx>
Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
Cc: Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx>
Cc: bpf@xxxxxxxxxxxxxxx
Cc: Joel Fernandes <joel@xxxxxxxxxxxxxxxxx>
Cc: Jordan Rife <jrife@xxxxxxxxxx>
---
  include/linux/tracepoint-defs.h |  2 ++
  include/linux/tracepoint.h      | 24 ++++++++++++++++++++++++
  include/trace/define_trace.h    |  2 +-
  3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/include/linux/tracepoint-defs.h b/include/linux/tracepoint-defs.h
index 967c08d9da84..53119e074c87 100644
--- a/include/linux/tracepoint-defs.h
+++ b/include/linux/tracepoint-defs.h
@@ -32,6 +32,8 @@ struct tracepoint_func {
  struct tracepoint_ext {
  	int (*regfunc)(void);
  	void (*unregfunc)(void);
+	/* Flags. */
+	unsigned int syscall:1;

I wonder if we should call it "sleepable" instead? For this patch set
do we really care if it's a system call or not? It's really if the
tracepoint is sleepable or not that's the issue. System calls are just
one user of it, there may be more in the future, and the changes to BPF
will still be needed.

Remember that syscall tracepoint probes are allowed to handle page
faults, but should not generally block, otherwise it would postpone the
grace periods of all RCU tasks trace users.

So naming this "sleepable" would be misleading, because probes are
not allowed general blocking, just to handle page faults.

If we look at the history of this tracepoint feature, we went with
the following naming over the various versions of the patch series:

1) Sleepable tracepoints: until we understood that we just want to
   allow page fault, not general sleeping, so we needed to change
   the name,

2) Faultable tracepoints: until Linus requested that we aim for
   something that is specific to system calls, rather than a generic
   thing.

   https://lore.kernel.org/lkml/CAHk-=wggDLDeTKbhb5hh--x=-DQd69v41137M72m6NOTmbD-cw@xxxxxxxxxxxxxx/

3) Syscall tracepoints: This is what we currently have.

Other than that, I think this could work.

Calling this field "sleepable" would be misleading. Calling it "faultable"
would be a better fit, but based on Linus' request, I'm tempted to stick
with "syscall" for now.

Your concern is to name this in a way that is general and future-proof.
Linus' point was to make it syscall-specific rather than general. My
position is that we should wait until we face other use-cases (if we
even do) before consider changing the naming from "syscall" to something
more generic.

Thanks,

Mathieu


-- Steve


  };
struct tracepoint {
diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h
index 83dc24ee8b13..93e70bc64533 100644
--- a/include/linux/tracepoint.h
+++ b/include/linux/tracepoint.h
@@ -104,6 +104,12 @@ void for_each_tracepoint_in_module(struct module *mod,
   * tracepoint_synchronize_unregister must be called between the last tracepoint
   * probe unregistration and the end of module exit to make sure there is no
   * caller executing a probe when it is freed.
+ *
+ * An alternative is to use the following for batch reclaim associated
+ * with a given tracepoint:
+ *
+ * - tracepoint_is_syscall() == false: call_rcu()
+ * - tracepoint_is_syscall() == true:  call_rcu_tasks_trace()
   */
  #ifdef CONFIG_TRACEPOINTS
  static inline void tracepoint_synchronize_unregister(void)
@@ -111,9 +117,17 @@ static inline void tracepoint_synchronize_unregister(void)
  	synchronize_rcu_tasks_trace();
  	synchronize_rcu();
  }
+static inline bool tracepoint_is_syscall(struct tracepoint *tp)
+{
+	return tp->ext && tp->ext->syscall;
+}
  #else
  static inline void tracepoint_synchronize_unregister(void)
  { }
+static inline bool tracepoint_is_syscall(struct tracepoint *tp)
+{
+	return false;
+}
  #endif
#ifdef CONFIG_HAVE_SYSCALL_TRACEPOINTS
@@ -345,6 +359,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
  	struct tracepoint_ext __tracepoint_ext_##_name = {		\
  		.regfunc = _reg,					\
  		.unregfunc = _unreg,					\
+		.syscall = false,					\
+	};								\
+	__DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args));
+
+#define DEFINE_TRACE_SYSCALL(_name, _reg, _unreg, _proto, _args)	\
+	struct tracepoint_ext __tracepoint_ext_##_name = {		\
+		.regfunc = _reg,					\
+		.unregfunc = _unreg,					\
+		.syscall = true,					\
  	};								\
  	__DEFINE_TRACE_EXT(_name, &__tracepoint_ext_##_name, PARAMS(_proto), PARAMS(_args));
@@ -389,6 +412,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
  #define __DECLARE_TRACE_SYSCALL	__DECLARE_TRACE
#define DEFINE_TRACE_FN(name, reg, unreg, proto, args)
+#define DEFINE_TRACE_SYSCALL(name, reg, unreg, proto, args)
  #define DEFINE_TRACE(name, proto, args)
  #define EXPORT_TRACEPOINT_SYMBOL_GPL(name)
  #define EXPORT_TRACEPOINT_SYMBOL(name)
diff --git a/include/trace/define_trace.h b/include/trace/define_trace.h
index ff5fa17a6259..63fea2218afa 100644
--- a/include/trace/define_trace.h
+++ b/include/trace/define_trace.h
@@ -48,7 +48,7 @@
#undef TRACE_EVENT_SYSCALL
  #define TRACE_EVENT_SYSCALL(name, proto, args, struct, assign, print, reg, unreg) \
-	DEFINE_TRACE_FN(name, reg, unreg, PARAMS(proto), PARAMS(args))
+	DEFINE_TRACE_SYSCALL(name, reg, unreg, PARAMS(proto), PARAMS(args))
#undef TRACE_EVENT_NOP
  #define TRACE_EVENT_NOP(name, proto, args, struct, assign, print)


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux