On Wed, Aug 2, 2023 at 11:07 PM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > On Wed, 2 Aug 2023 08:10:00 -0400 > Ze Gao <zegao2021@xxxxxxxxx> wrote: > > > From: Ze Gao <zegao2021@xxxxxxxxx> > > > > Internal representations of task state are likely to be changed > > or ordered, and reporting them to userspace without exporting > > them as part of API is basically wrong, which can easily break > > a userspace observability tool as kernel evolves. For example, > > perf suffers from this and still reports wrong states as of this > > writing. > > > > OTOH, some masqueraded states like TASK_REPORT_IDLE and > > TASK_REPORT_MAX are also reported inadvertently, which confuses > > things even more and most userspace tools do not even take them > > into consideration. > > > > So add a new variable in company with the old raw value to > > report task state in symbolic chars, which are self-explaining > > and no further translation is needed. Of course this does not > > break any userspace tool. > > > > Note for PREEMPT_ACTIVE, we introduce 'p' to report it and use > > the old conventions for the rest. > > > > Signed-off-by: Ze Gao <zegao@xxxxxxxxxxx> > > Reviewed-by: Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx> > > Acked-by: Ian Rogers <irogers@xxxxxxxxxx> > > --- > > include/trace/events/sched.h | 44 ++++++++++++++++++++++-------------- > > 1 file changed, 27 insertions(+), 17 deletions(-) > > > > diff --git a/include/trace/events/sched.h b/include/trace/events/sched.h > > index 7d34db20b2c6..1c7b94793495 100644 > > --- a/include/trace/events/sched.h > > +++ b/include/trace/events/sched.h > > @@ -6,6 +6,7 @@ > > #define _TRACE_SCHED_H > > > > #include <linux/kthread.h> > > +#include <linux/sched.h> > > #include <linux/sched/numa_balancing.h> > > #include <linux/tracepoint.h> > > #include <linux/binfmts.h> > > @@ -214,6 +215,27 @@ static inline int __trace_sched_switch_state(bool preempt, > > > > return state ? (1 << (state - 1)) : state; > > } > > + > > +static inline char __trace_sched_switch_state_char(bool preempt, > > + unsigned int prev_state, > > + struct task_struct *p) > > +{ > > + long state; > > + > > +#ifdef CONFIG_SCHED_DEBUG > > + BUG_ON(p != current); > > BUG? Why not WARN_ON()? I directly copied it from __trace_sched_switch_state since they are very similar. I had doubt on this too but decided to keep it in case people want to be 100% sure that the current task is exactly the one that is being switched, otherwise it's a fatal problem for scheduler at the point where trace_sched_switch is called. If you think WARN_ON_ONCE is more appropriate, I can fix both in v6. Thoughts? Regards, Ze