Re: [PATCH] vsprintf: simplify number handling

Steven Rostedt <rostedt@xxxxxxxxxxx> · Wed, 18 Dec 2024 13:31:26 -0500

On Wed, 18 Dec 2024 13:04:27 -0500
Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> > It's very much a part of the standard printf format, and is very much
> > inherent to the whole varargs and C integer promotion rules (ie you
> > literally *cannot* pass an actual 'char' value to a varargs function -
> > the normal C integer type extension rules apply).
> > 
> > So this is not really some odd kernel extension, and while there are
> > only a handful of users in tracing (that apparently trace-cmd cannot
> > deal with), it's not even _that_ uncommon in general:  
> 
> trace-cmd (and libtraceevent for that matter) does handle "%h" and "%hh"
> as well.
> 
> But the vbin_printf() which trace_printk() uses is a different beast, and
> requires rebuilding the arguments so that it can be parsed, and there "%h"
> isn't supported.

Just to state the difference between TP_printk() and trace_printk() is that
with trace events only the data is saved to the ring buffer. For example,
for the sched_waking trace event:

TRACE_EVENT(sched_waking,

	TP_PROTO(struct task_struct *p),

	TP_ARGS(__perf_task(p)),

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	target_cpu		)
	),

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
		__entry->target_cpu	= task_cpu(p);
	),

	TP_printk("comm=%s pid=%d prio=%d target_cpu=%03d",
		  __entry->comm, __entry->pid, __entry->prio,
		  __entry->target_cpu)
);

[ Note, I made this into a TRACE_EVENT() but in reality it's multiple
  events that uses DECLARE_EVENT_CLASS() and DEFINE_EVENT(), but the idea is
  still the same. ]

	TP_STRUCT__entry(
		__array(	char,	comm,	TASK_COMM_LEN	)
		__field(	pid_t,	pid			)
		__field(	int,	prio			)
		__field(	int,	target_cpu		)
	),

That turns into:

	struct trace_event_raw_sched_waking {
		struct trace_entry		ent;
		char				comm[TASK_COMM_LEN];
		pid_t				pid;
		int				prio;
		int				target_cpu;
		char				__data[];
	}

Then we have how to load that structure:

	TP_fast_assign(
		memcpy(__entry->comm, p->comm, TASK_COMM_LEN);
		__entry->pid		= p->pid;
		__entry->prio		= p->prio; /* XXX SCHED_DEADLINE */
		__entry->target_cpu	= task_cpu(p);
	),

Where the "__entry" is of type struct trace_event_raw_sched_waking and
points into the reserved location in the ring buffer. This has the above
assignments write directly into the ring buffer and avoids any copying.

Now the "trace" file needs to know how to print it, that's where the
TP_printk() is. It is basically a vsprintf(TP_printk()) with the __entry
again pointing to the content of the ring buffer.

But trace_printk() is not a trace event and requires writing the format as
well as the data into the ring buffer when it is called. It use to simply
just use vsnprintf() but it was considered much faster to not do the
formatting during the record and to push it back to when it is read. As
trace_printk() is used specifically to find hard to hit bugs, to keep it
from causing "heisenbugs", using vbin_printf() proved to be much faster and
made trace_printk() less intrusive to debugging.

For historical analysis, here's where it was first introduced:

  https://lore.kernel.org/lkml/49aa0c73.1c07d00a.4fc6.ffffb4d7@xxxxxxxxxxxxx/

-- Steve