Re: RFC: Lightweight tracing mechanism

Marc-André Lureau <marcandre.lureau@xxxxxxxxxx> · Wed, 7 Jun 2017 11:30:24 -0400 (EDT)

Hi

----- Original Message -----
> 
> > On 7 Jun 2017, at 13:51, Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> > wrote:
> > 
> > Hi
> > 
> > ----- Original Message -----
> >> 
> >>> On 6 Jun 2017, at 20:45, Marc-André Lureau <marcandre.lureau@xxxxxxxxxx>
> >>> wrote:
> >>> 
> >>> I don't think any of these measurements couldn't be reported accurately
> >>> with a regular log or structured log. Thus I don't understand what
> >>> problem
> >>> you try to solve.
> >>> 
> >>> Sorry, but I also have a hard time to understand what your tracing
> >>> library
> >>> does. Could you point to a simple example and short doc?
> >> 
> >> Probably should have started with that. I thought this was relatively
> >> obvious
> >> from the code and the comments I had put in the initial mail. Apparently
> >> not, sorry.
> >> 
> >> Purpose: The tracing / tweaking mechanism (not a library, just a couple of
> >> macro and a multiply-included .def file) is to offer an easy, scalable,
> >> consistent way to add debug code that is intended to stay in the code, and
> >> provides generally useful, yet focused information or program behavior
> >> alterations. Examples include: showing information about a specific
> >> subsystem (memory), showing a specific kind of frequently used measurement
> >> (FPS, network utilization), tweaking internal parameters eg. to make the
> >> system more easily hit a corner case (e.g. put constrants on bandwidth or
> >> memory usage, tweak image compression parameters, etc.
> >> 
> >> This mechanism lets you add two categories of configurable values in the
> >> code:
> >> 1. “traces”, which are boolean flags, named “traces" because they are most
> >> often used to enable a specific set of runtime traces.
> > 
> > Ok, that's where the confusion came from. Your solution is not a a complete
> > tracing solution (like dtrace, ust etc),
> 
> Those are not complete solutions for my purpose, otherwise I could use them,
> and they would have been used in Spice already.

They are not use because we don't have such problems in the first place.

> 
> > but rather a way to have runtime debugging enabled/disabled/listed by
> > categories, which can be used on top of glog/fprintf etc. However, it
> > overlaps with other tracing solution that provides different means.
> 
> How does it overlap? If I can’t use your solutions for Spice code, then it
> does not overlap. I think I demonstrated I can not use GTK flags, for
> example, because they were not designed to be used outside of glib itself.
> None of the suggestions you made so far solves my  problem. It’s not enough
> for something to exist, be known, trusted, powerful, whatever. It first and
> foremost has to be useful for what I want to do.
> 
> I have not ruled out the alternatives you talked about out of ignorance. I
> want a scalpel to do some specific surgery, and shared the kind of surgery I
> had in mind. It’s not helpful to suggest I use a chainsaw (more powerful),
> an ax (widely used), a Swiss army knife (standard), or a #ifdef hammer
> (while at the same time warning me about hammer-based surgery leading to
> dead code/patients…).
> 
> 
> > As I said, I think filtering/greping log is actually more powerful, for the
> > current logs. And structured logs will also improve the situation
> > (query/filter on specific fields).
> 
> Logging everything and filtering with grep is not “powerful” at all. It’s
> inefficient enough that in projects following that approach, most developers
> don’t leave their debug printfs in the code, no matter how useful they are.
> I see some dead debug code in spice today for that very reason, like
> CO_DEBUG…

CO_DEBUG can be enabled by tweaking the #if 0 line. In all cases, this code is unmaintained. I have put some effort in a gcoroutine library instead. But the thread coroutine implementation is quite useless, in fact it got removed in qemu recently (the origin of the implementation, they share the same author)

> 
> 
> > Now, for performance sensitive code, traditionally, #ifdef code is ok, but
> > it tends to linger and become obsolete.
> 
> Which is exactly why I am not proposing #ifdef code.
> 
> 
> > In spice-gtk, there is also spice_util_get_debug(), but that will enable
> > all extra logging/measurements and may not optimal.
> 
> It’s very suboptimal, indeed.

It depends what for. You should first tell what you want to trace that is so performance sensitive that it wouldn't fit with the current log.

> 
> > In all cases, I question the need for having such performance-sensitive
> > code in releases, or even in upstream code, as it tends to become useless
> > when the performance problem is considered solved, or when the people
> > looking at the esoteric logs move to something else (based on my
> > experience).
> 
> There, your experience contradicts mine, but it’s entirely logical, because
> your experience seems to be based on using grep and #ifdef logs, which
> indeed suffers from the problems you point out, whereas mine is based on
> using mechanisms similar to the one I just proposed, where whoever wants
> output X asks for output X and does not pollute anybody else.

I don't have problems with debug log. I usually don't have performance-sensitive code to trace, or that I would want to keep checked in code. If I had it, I would look at existing tracing solution, that provides means to analyse them with the rest of the system.

> 
> 
> > Imho, none your examples seem performance-sensitive to me, and could be in
> > the debug log by default.
> 
> I did not construct the examples to be performance-sensitive, but as
> pedagogical illustrations. But of course, the solution is specifically
> designed so that you can add fine-tuned instrumentation in
> performance-sensitive code. I hinted multiple times that this was one of the
> motivations for such a mechanism.
> 
> Instrumentation like this in the marshalling or read/write code would have
> helped me spot the reason the macOS client was not working, by the way.

What would be the change/trace that would have helped you?

> 
> > If you would trace some value inside a tight loop, or hot path, for ex,
> > that would be different.
> 
> And that’s some of the work I want to do, and I cannot do it at the moment
> because spice lacks the required mechanism. This is precisely what this
> proposal is about.

My point is that your solution doesn't help though, because when you have traces in hot path, you need further tools to process them.

> > But who would really need it and be able to process that kind of data,
> > without extra tools for analysis etc?
> 
> Well, there’s me to start with ;-) But experience tells me that it does not
> take long for a mechanism like this to start being used.

If it's just providing new command line options and filtering, it will just be yet another way of doing debugging. By contrast, grep is universal :) To give you an idea, I never use libvirt log filters, that I never remember how to use properly (http://libvirt.org/guide/html/Application_Development_Guide-Connections-Debug.html). But I know how to use grep since my 1st day of programming...

> 
> > That's where systemdtap/lttng etc are good at, I believe.
> 
> I don’t see how they would respond to my needs. How would you implement the
> FPS or fault injection examples with systemtap or lttng? Weren’t you the one

This doesn't need performance-sensitive trace producing gigabytes of data. You can just log it normally.

> who recently pointed out you were happy to get rid of log4cpp? So why
> suggest lttng, which is even more complicated, as far as I can tell
> Linux-only, and apparently requires round trips to the kernel and special
> tools just to see what has been logged? [Digression: this somehow remind me
> of some code I once read where the author was writing in a file using
> system(“echo message > file”)… Why would you want to use fprintf when system
> and echo do the job?…]

I suggest it for performance-sensitive traces, and related analysis tools.

> 
> 
> > 
> >> 2. “tweaks”, which are integer values, used to tweak the behaviour of the
> >> code, e.g. to trigger rare / corner case behaviours.
> > 
> > Now this is even further away from tracing.
> 
> Again, my experience with similar mechanisms in multiple products before
> shows me otherwise. And since you mentioned Linux tracing mechanisms, let me
> point out that kernel configurable parameters follow a similar pattern. You
> can set configurable parameters like the file-max “tweak" through the same
> mechanism used you configure kernel traces (in that case, /proc or /sys).
> 
> 
> > mho, existing argument parsing (for end-users) or environment parsing
> > (mostly for devs, or global tweaks) are quite fine here.
> 
> But that’s what I’m doing. I’m only automating the process, so that for debug
> and trace control, it’s always the same, self-documented and consistent.
> 
> > We could better document the existing environment variables, beside that,
> > what does "tweaks" bring?
> 
> 1. Self-documentation, no need to better document existing environment
> variables or options or whatever. They are all in one place, and you can get
> the available traces from a compiled program using -t list.
> 
> 2. Ease of use: Adding a trace flag is one line of code. And when you use it,
> you get the output straight on stderr, easy to look at, no additional tool
> needed (not even grep).
> 
> 3. Efficiency / scalability. You can have tens of named traces and tweaks.
> You don’t want to have tens of undocumented, purpose-specific environment
> variables, each with their own rule and syntax.

There is consistent user argument parsing/help with GOption.

But for such debugging tweaks, I would rather stick to environment variables, which are usually documented in man pages if they are useful for global system configuration, and a developer text file in the source repository for the rest. The developper options are usually best left compiled out unless the program/library is built with --enable-debug. We shouldn't expose developer tweaks to the user, as this will likely result in breakage or a variety of issues due to the combinations of parameters: I don't want -t (or whatever) to list various user-tweakable parameters. When we add user visible parameters, we must be careful about its stability. Furthermore, the less the options, the better for everyone.

To sum up, either its a performance-sensitive thing to trace, or you generate a lot of data, in which case there are existing tools.

Or it's not performance-sensitive, and can be in production builds and printf/structured logging & grep/query are your best friends.

Or you have compile time --enable-debug and various environment variables to tweak the developer-only dumps.

True, I am very accustomed to this and I find it annoying to have adapt to any project that deviates from this scheme.

What you propose doesn't seem better than making a list of categories and using structured logging, plus using --enable-debug and getenv variables for developpers tweaks.

But then again, I may not fully understand the problem you try to solve :)
_______________________________________________
Spice-devel mailing list
Spice-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/spice-devel