Re: [PATCH] v4l2-dev: Add tracepoints for QBUF and DQBUF

Mauro Carvalho Chehab <m.chehab@xxxxxxxxxxx> · Wed, 11 Dec 2013 10:44:55 -0200

Em Wed, 11 Dec 2013 12:53:55 +0100
Hans Verkuil <hverkuil@xxxxxxxxx> escreveu:

> Hi Mauro,
> 
> On 12/11/13 11:44, Mauro Carvalho Chehab wrote:
> > Hans,
> > 
> > Em Wed, 11 Dec 2013 08:29:58 +0100
> > Hans Verkuil <hverkuil@xxxxxxxxx> escreveu:
> > 
> >> On 12/10/2013 09:53 PM, Mauro Carvalho Chehab wrote:
> >>> Em Sat, 23 Nov 2013 17:54:48 +0100
> >>> Hans Verkuil <hverkuil@xxxxxxxxx> escreveu:
> >>>
> >>>> On 11/23/2013 05:30 PM, Sylwester Nawrocki wrote:
> >>>>> Hi,
> >>>>>
> >>>>> On 11/23/2013 12:25 PM, Hans Verkuil wrote:
> >>>>>> Hi Wade,
> >>>>>>
> >>>>>> On 11/22/2013 08:48 PM, Wade Farnsworth wrote:
> >>>>>>> Add tracepoints to the QBUF and DQBUF ioctls to enable rudimentary
> >>>>>>> performance measurements using standard kernel tracers.
> >>>>>>>
> >>>>>>> Signed-off-by: Wade Farnsworth<wade_farnsworth@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>>
> >>>>>>> This is the update to the RFC patch I posted a few weeks back.  I've added
> >>>>>>> several bits of metadata to the tracepoint output per Mauro's suggestion.
> >>>>>>
> >>>>>> I don't like this. All v4l2 ioctls can already be traced by doing e.g.
> >>>>>> echo 1 (or echo 2)>/sys/class/video4linux/video0/debug.
> >>>>>>
> >>>>>> So this code basically duplicates that functionality. It would be nice to be able
> >>>>>> to tie in the existing tracing code (v4l2-ioctl.c) into tracepoints.
> >>>>>
> >>>>> I think it would be really nice to have this kind of support for standard
> >>>>> traces at the v4l2 subsystem. Presumably it could even gradually replace
> >>>>> the v4l2 custom debug infrastructure.
> >>>>>
> >>>>> If I understand things correctly, the current tracing/profiling 
> >>>>> infrastructure
> >>>>> is much less invasive than inserting printks all over, which may cause 
> >>>>> changes
> >>>>> in control flow. I doubt the system could be reliably profiled by 
> >>>>> enabling all
> >>>>> those debug prints.
> >>>>>
> >>>>> So my vote would be to add support for standard tracers, like in other
> >>>>> subsystems in the kernel.
> >>>>
> >>>> The reason for the current system is to trace which ioctls are called in
> >>>> what order by a misbehaving application. It's very useful for that,
> >>>> especially when trying to debug user problems.
> >>>>
> >>>> I don't mind switching to tracepoints as long as this functionality is
> >>>> kept one way or another.
> >>>
> >>> I agree with Sylwester: we should move to tracepoints, and this is a good
> >>> start.
> >>
> >> As I mentioned, I don't mind switching to tracepoints, but not in the way the
> >> current patch does it. I certainly don't agree with you merging this patch
> >> as-is without further discussion.
> > 
> > Let's not mix the subjects here. There are two different demands that can be
> > either fulfilled by the same solution or by different ones:
> > 	1) To have a way to enable debugging all parameters passed from/to
> > userspace;
> > 	2) To be able to measure the driver (and core) performance while
> > streaming.
> > 
> > This patch's scope is to solve (2), by allowing userspace to hook the two
> > ioctls that queues/dequeues streaming buffers.
> > 
> > With that regards, what's wrong on this patch? I couldn't see anything bad
> > there. It is not meant to solve (1).
> 
> I have two problems with it: 
> 
> 1) Right now it is just checking for two ioctls, but as soon as we add more
> tracepoints you get another switch on the ioctl command, something I've tried
> to avoid by creating the table. So a table-oriented implementation would be
> much preferred.

>From performance measurement PoV, I don't see any reason to add it to any
other ioctl. Of course if we revisit it and other traces become needed,
then we should use a table approach instead.

> 
> 2) It duplicates the already existing code to dump the v4l2_buffer struct.
> Is there a way to have just one copy of that? I was hoping Wade could look
> into that, and if not, then it is something I can explore (although finding
> time is an issue).

Having implemented tracepoints myself, I'd say that the answer is no.

Let me give you an example.

The ras:aer_event trace is a simple tracepoint that also uses the 
__print_flags() macro. It is implemented at include/trace/events/ras.h
as:

#define aer_correctable_errors		\
	{BIT(0),	"Receiver Error"},		\
	{BIT(6),	"Bad TLP"},			\
	{BIT(7),	"Bad DLLP"},			\
	{BIT(8),	"RELAY_NUM Rollover"},		\
	{BIT(12),	"Replay Timer Timeout"},	\
	{BIT(13),	"Advisory Non-Fatal"}

#define aer_uncorrectable_errors		\
	{BIT(4),	"Data Link Protocol"},		\
	{BIT(12),	"Poisoned TLP"},		\
	{BIT(13),	"Flow Control Protocol"},	\
	{BIT(14),	"Completion Timeout"},		\
	{BIT(15),	"Completer Abort"},		\
	{BIT(16),	"Unexpected Completion"},	\
	{BIT(17),	"Receiver Overflow"},		\
	{BIT(18),	"Malformed TLP"},		\
	{BIT(19),	"ECRC"},			\
	{BIT(20),	"Unsupported Request"}

TRACE_EVENT(aer_event,
	TP_PROTO(const char *dev_name,
		 const u32 status,
		 const u8 severity),

	TP_ARGS(dev_name, status, severity),

	TP_STRUCT__entry(
		__string(	dev_name,	dev_name	)
		__field(	u32,		status		)
		__field(	u8,		severity	)
	),

	TP_fast_assign(
		__assign_str(dev_name, dev_name);
		__entry->status		= status;
		__entry->severity	= severity;
	),

	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
		__get_str(dev_name),
		__entry->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" :
			__entry->severity == HW_EVENT_ERR_FATAL ?
			"Fatal" : "Uncorrected",
		__entry->severity == HW_EVENT_ERR_CORRECTED ?
		__print_flags(__entry->status, "|", aer_correctable_errors) :
		__print_flags(__entry->status, "|", aer_uncorrectable_errors))

Tracepoints work by having a binary ringbuffer where the data at TP_STRUCT
is passed. In this case, it passes a string, an u32 and an u8.

The TP_printk() macro actually produces a format filenode, describing
each field that would be filled in the ringbuffer and how to convert the
data structure from binary to text:

# cat /sys/kernel/debug/tracing/events/ras/aer_event/format 
name: aer_event
ID: 877
format:
	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
	field:unsigned char common_preempt_count;	offset:3;	size:1;signed:0;
	field:int common_pid;	offset:4;	size:4;	signed:1;

	field:__data_loc char[] dev_name;	offset:8;	size:4;	signed:1;
	field:u32 status;	offset:12;	size:4;	signed:0;
	field:u8 severity;	offset:16;	size:1;	signed:0;

print fmt: "%s PCIe Bus Error: severity=%s, %s
", __get_str(dev_name), REC->severity == HW_EVENT_ERR_CORRECTED ? "Corrected" : REC->severity == HW_EVENT_ERR_FATAL ? "Fatal" : "Uncorrected", REC->severity == HW_EVENT_ERR_CORRECTED ? __print_flags(REC->status, "|", {(1UL << (0)), "Receiver Error"}, {(1UL << (6)), "Bad TLP"}, {(1UL << (7)), "Bad DLLP"}, {(1UL << (8)), "RELAY_NUM Rollover"}, {(1UL << (12)), "Replay Timer Timeout"}, {(1UL << (13)), "Advisory Non-Fatal"}) : __print_flags(REC->status, "|", {(1UL << (4)), "Data Link Protocol"}, {(1UL << (12)), "Poisoned TLP"}, {(1UL << (13)), "Flow Control Protocol"}, {(1UL << (14)), "Completion Timeout"}, {(1UL << (15)), "Completer Abort"}, {(1UL << (16)), "Unexpected Completion"}, {(1UL << (17)), "Receiver Overflow"}, {(1UL << (18)), "Malformed TLP"}, {(1UL << (19)), "ECRC"}, {(1UL << (20)), "Unsupported Request"})

You can see that _print_flags() actually produced a lot of code inside "print fmt".

When someone reads from /sys/kernel/debug/tracing/trace, the print fmt is
handled by the Kernel.

However, when someone uses a performance tool like perf, the binary blobs
are read from the per-CPU raw trace filenodes:
	/sys/kernel/debug/tracing/per_cpu/cpu*/trace_pipe_raw

And the "print fmt" line at the format file is actually used as a hint
for tracing userspace tool to handle it that can be used or not.
For example, at the rasdaemon, I'm overriding all default handlers,
using my own handler, as I want to store all events on a sqlite database:

	https://git.fedorahosted.org/cgit/rasdaemon.git/tree/ras-aer-handler.c

> 
> Bottom line as far as I am concerned: it was merged while I have outstanding
> questions.

Well, only on this email you're letting clear what are your concerns.

> If there are good technical reasons why the two problems stated above can't
> be easily resolved, then I will reconsider, but for now I think it is too early
> to merge. For the record, I don't think I mentioned the first problem in my
> original reply, it's something I noticed yesterday while looking at the patch
> again.
> 
> > 
> > Before this patch, all we have is (1), as a non-dynamic printk is too
> > intrusive to be used for performance measurement. So, there's no way to
> > measure how much time a buffer takes to be filled.
> 
> Having tracepoints here doesn't tell you anything about that. At best it gives
> you information about the jitter.
> 
> How are these tracepoints supposed to be used? What information will they
> provide? Are tracepoints expected to be limited to QBUF/DQBUF? Or other
> ioctls/fops as well? If so, which ones?

As shown above, a performance tool can add handlers for tracepoint data,
allowing to discover, for example, if dequeue occurred on a different order
than queue, associating each dequeued buffer to the corresponding queued
one, and to measure the time between each queue/dequeue pairs.

> 
> I just have too many questions and I'd like to see some answers before something
> like this is merged in the v4l2 core.
> 
> Rereading my older replies I see that I wasn't particularly clear about what
> my objections to the patch were, for which I apologies.
> 
> Nevertheless, I stand by my opinion that this patch needs more discussion.
> 
> Wade, I appreciate it if you can give some more insights into how you are
> using this and what it gives you.
> 
> > In a matter of fact, in the case you didn't noticed, we are already somewhat
> > moving to traces, as several drivers are now using dynamic_printk for debug 
> > messages, Yet, lots of drivers still use either their own normal
> > printk-based debug macros.
> > 
> > Now, you're proposing to use the same solution for (1) too. 
> > 
> > Ok, we can go on that direction, but this is a separate issue. Converting
> > the v4l2-ioctl to use tracepoints is the easiest part. However, there are
> > several things to consider, if we decide to use it for debug purposes:
> > 
> > a) It will likely require to convert all printks to dynamic ones, as we
> > want to have all debug messages serialized.
> > 
> > Let me explain it better: if some debug messages come via the standard 
> > printk mechanism while others come via traces, we loose the capability
> > of receiving the messages at the same order that they were produced, and
> > it could be harder if not impossible to synchronize them into the right
> > order.
> > 
> > b) It may make sense to write an userspace tool similar to the rasdaemon
> > [1] to allow controlling the debug output. That tool directly reads the
> > events from the raw tracing queues, formatting the data on userspace and
> > hooking only the events that are needed. 
> > 
> > Currently, the rasdaemon has inside part of the code used by a perf
> > tool copied on it (as this code is not inside any library yet). However,
> > that code is being prepared to be used as a library. So, it should be
> > even easier to write such tool in a near future.
> > 
> > [1] https://git.fedorahosted.org/cgit/rasdaemon.git/
> > 
> > c) I don't think that tracepoints are the right mechanism for a post-mortem
> > analysis[2]. E. g. if the Kernel crashes, a printk-based mechanism will 
> > output something, but I'm not sure if doing the same is possible with traces
> > (e. g. forcing the trace messages to go to to tty).
> > 
> > You should notice that, with the dynamic_printk macros, it is possible to
> > compile the Kernel to use normal printk macros for debug macros.
> > 
> > However, if we convert the ioctl printk's to tracepoints, we may still
> > need to have a fallback printk mechanism for ioctl debug messages.
> > 
> > [2] There are actually some discussions at the RAS mailing lists about
> > how to store the tracing data and printk messages on newer hardware that
> > offer a non-volatile memory mechanism for that. So, we might actually
> > have a post-mortem mechanism for tracepoints in the future, depending on
> > the hardware. Yet, not all types of hardware will have such NVRAM.
> > 
> > Btw, it is possible to use strace to trace all ioctl's. So, the existing
> > mechanism is actually a duplicated feature, as userspace already covers it.
> > However, in practice, we opted to explicitly duplicate that feature at
> > Kernel level, for debugging purposes, because strace doesn't solve (a)
> > and (c).
> > 
> > What I'm saying is that a project to convert the debug facility into
> > tracepoints is something that:
> 
> That's not what I want to do. What I would like to avoid is duplicating
> the code to dump ioctl arguments. Perhaps tracepoints can call a specified
> function to dump the arguments? I tried to understand the tracepoint
> implementation but I got lost in a maze of macros...

As shown above, tracepoints use a completely different way to output
data than printk.

> 
> > 	- it is independent of adding some facility for performance 
> > 	  measurement (although it may share the same infrastructure);
> > 
> > 	- it will require lots of effort to change every single driver
> > 	  to use dynamic printk's;
> > 
> > 	- it may require writing a helper tool to enable just the v4l2
> > 	  traces and eventually collect them;
> > 
> > 	- it may require a fallback printk mechanism in order to debug
> > 	  kernel panic.
> > 
> >> To make it official:
> >>
> >> Nacked-by: Hans Verkuil <hans.verkuil@xxxxxxxxx>
> >>
> >> If we do tracepoints, then we do it right and for all ioctls, not in this
> >> half-baked manner.
> >>
> >> Please revert.
> > 
> > Let's not mix things. If this patch is correct and solves for performance
> > measurement, I don't see any reason to revert.
> 
> It duplicates code for dumping ioctl arguments, I would prefer a table-based
> solution if possible and I am not convinced of the usefulness of tracepoints
> here for performance measurements (at the very least I'd like to know more
> about the use-case and for which ioctls and file operations tracepoints might
> be useful).
> 
> In other words: this patch needs more discussion. I wouldn't care if this
> was in a driver, but for the v4l2 core it's too early to merge.
> 
> > Using tracepoints for debug is a separate and independent project.
> 
> I agree, but that's not what my concerns are about (and I am not sure
> whether a project like that would be desirable/necessary at all).
> 
> > Such project require lots of tests to proof that it is a viable solution,
> > several discussions and a huge time frame to convert all drivers to use
> > tracepoints too, plus some post-mortem debug logic.
> 
> Regards,
> 
> 	Hans
> 

-- 

Cheers,
Mauro
--
To unsubscribe from this list: send the line "unsubscribe linux-media" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html