On Tue, 2021-04-06 at 13:38 -0400, Steven Rostedt wrote: > On Wed, 24 Feb 2021 19:22:48 +0000 > <Viktor.Rosendahl@xxxxxx> wrote: > > > My question to you is, would you like to work on adding that feature to > > > trace-cmd? > > > > I would be interested to work on the feature but I would probably need > > permission from my employer, also in case I do it as a spare time project. > > > > For this reason, I will need some time come to a decision. I will get back > > to > > you when I know the answer. > > Hi Viktor, > > Any word on if your employer would let you work on trace-cmd? Hi Steve, The management was agreeable to it but I haven't had any spare time to work on it. To be honest, I could have worked on it during my Easter holidays but I just happened to busy myself with other private interests :) > > > > git://git.kernel.org/pub/scm/utils/trace-cmd/trace-cmd.git > > > > > > (which requires the libtracefs and libtracevent libraries). > > > > > > I'm thinking if someone were to do: > > > > > > trace-cmd record -p wakeup_rt > > > > > > It would then act just like the latency-collector and record every new > > > instance of a latency into the trace.dat file, where: > > > > > > trace-cmd report > > > > > > would then nicely format that data for the user. > > > > > > Thoughts? > > > > It seems like trace-cmd is mostly GPL-2.0. The latency-collector is also > > GPL- > > 2.0, so I guess that it would be possible to take snippets of code and > > inspiration from it. > > > > Would you allow the hackish random algorithm to be used in trace-cmd? > > I mean the "--random" option in latency-collector. > > > > If you feel it is useful, sure. Although a definition of what it exactly > does needs to be a bit more explained. I'm not really sure what use the > --random option is for the latency tracer. > The usefulness lies in the case where you have latencies that systematically occur very close to each other. Let's suppose that we have some kernel code like this: static void foo(void) { long flags; spin_lock_irqsave(&mylock, flags); do_something_interesting(); /* This always takes 6 milliseconds */ spin_unlock_irqrestore(&mylock, flags) } static void bar(void) { long flags; spin_lock_irqsave(&mylock, flags); do_something_else(); /* This takes several tens of milliseconds */ spin_unlock_irqrestore(&mylock, flags) } void foobar(void) { foo(); // There could be plenty of code here, as long as it takes less than one // blackout period, we would have problem. This could be hundreds of ms. bar(); } EXPORT_SYMBOL(foobar); Let's assume that I believe that my soft realtime system may miss a deadline if there are latencies in the kernel that are greater than 5ms, so I use a threshold of 5 ms when tracing. Now, I start a long and expensive test campaign with many devices. I use the latency-collector without the --random option because I think it's unnecessary, then I will only ever find the latency caused by foo(), which is 6 ms. Specifically, what will happen is that at the end of foo(), ftrace will generate that inotify() event, which will wake up the latency collector, which opens the trace file. When the trace file is opened, there will be a blackout of a few hundred milliseconds and during this blackout no latencies will be captured. The latency event from bar() would only be generated at the end of the bar() function and since bar() takes so long to execute, it's very likely that the latency-collector has had time to react to the first latency and opened the trace file, which will cause the blackout to happen before we reach te end of bar(). Perhaps, I am able to fix the latency generated by foo(), so I believe that everything is fine, which it is not because I have that second much longer latency in bar() that I have not seen. Without the --random option, I would at this point need to start another test campaign, only then would I start seeing the longer latencies from bar(). On the other hand, if I use the --random option with the latency-collector, then what will happen is that sometimes the latency-collector will open the trace file immediately and other times it will sleep and wait for up to one second before doing so. If it opens the file immediately, we will get the first latency. If based on that random toin coss function decides to sleep, then we will get the second. If a long test camaping is exectuted, and foobar() is called several times during that campaign, then there is a good probability that we will capture both the latency from foo() and the one from bar(). Now --random is a bit more complicated because it actually tosses the coin again if another latency occurs when it is sleeping before opening the file. The probability of that coin toss function are chosen so that if we assume that there is a maximum of N closely occuring latencies, we will get each of them with probability 1/N. If the latency-collector detects a situation were it actually has detected N closely occuring latencies, it will automatically increase N to N + 1 and update the probablities of the coin toss accordingly. So basically the idea is that by acting randomly, we will not systematically lose a particular latency. It will also not make matters worse if we never get latencies that occur close to each other; the only drawback is that we sometimes wait for one second before opening the trace file and that doesn't really matter if there aren't any closely occuring latencies. Admittedly, I have never detected any real world case like the one described above in the kernel but I don't think that it's completely unthinkable. Also, a real world situation is likely to be more complicated but it's enough if we have one latency always occurring before another. Also, they need not be immediately after each other, it's enough if they are within one blackout period from each other. I measured the blackout period at one point with a system that I was working with and IIRC, it was something like 450 ms. The exact lenght could be dependent on things like the depth of the stack trace that needs to unwinded etc. Does this explanation make any sense to you? best regards, Viktor > -- Steve