On Thu, Sep 01, 2022 at 05:38:44PM -0400, Steven Rostedt wrote: > On Tue, 30 Aug 2022 14:49:16 -0700 > Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > From: Kent Overstreet <kent.overstreet@xxxxxxxxx> > > > > This adds the ability to easily instrument code for measuring latency. > > To use, add the following to calls to your code, at the start and end of > > the event you wish to measure: > > > > code_tag_time_stats_start(start_time); > > code_tag_time_stats_finish(start_time); > > So you need to modify the code to see what you want? Figuring out the _correct_ place to measure is often a significant amount of the total effort. Having done so once, why not annotate that in the source code? > For function length you could just do something like this: > > # cd /sys/kernel/tracing > # echo __skb_wait_for_more_packets > set_ftrace_filter > # echo 1 > function_profile_enabled > # cat trace_stat/function* > Function Hit Time Avg s^2 > -------- --- ---- --- --- > __skb_wait_for_more_packets 1 0.000 us 0.000 us 0.000 us > Function Hit Time Avg s^2 > -------- --- ---- --- --- > __skb_wait_for_more_packets 1 74.813 us 74.813 us 0.000 us > Function Hit Time Avg s^2 > -------- --- ---- --- --- > Function Hit Time Avg s^2 > -------- --- ---- --- --- > > The above is for a 4 CPU machine. The s^2 is the square of the standard > deviation (makes not having to do divisions while it runs). > > But if you are looking for latency between two events (which can be kprobes > too, where you do not need to rebuild your kernel): > > From: https://man.archlinux.org/man/sqlhist.1.en > which comes in: https://git.kernel.org/pub/scm/libs/libtrace/libtracefs.git/ > if not already installed on your distro. > > # sqlhist -e -n wakeup_lat 'select end.next_comm as comm,start.pid,start.prio,(end.TIMESTAMP_USECS - start.TIMESTAMP_USECS) as delta from sched_waking as start join sched_switch as end on start.pid = end.next_pid where start.prio < 100' > > The above creates a synthetic event called "wakeup_lat" that joins two > events (sched_waking and sched_switch) when the pid field of sched_waking > matches the next_pid field of sched_switch. When there is a match, it will > trigger the wakeup_lat event only if the prio of the sched_waking event is > less than 100 (which in the kernel means any real-time task). The > wakeup_lat event will record the next_comm (as comm field), the pid of > woken task and the time delta in microseconds between the two events. So this looks like it's gotten better since I last looked, but it's still not there yet. Part of the problem is that the tracepoints themselves are in the wrong place: your end event is when a task is woken up, but that means spurious wakeups will cause one wait_event() call to be reported as multiple smaller waits, not one long wait - oops, now I can't actually find the thing that's causing my multi-second delay. Also, in your example you don't have it broken out by callsite. That would be the first thing I'd need for any real world debugging. So, it looks like tracing has made some progress over the past 10 years, but for debugging latency issues it's still not there yet in general. I will definitely remember function latency tracing the next time I'm doing performance work, but I expect that to be far too heavy to enable on a live server. This thing is only a couple hundred lines of code though, so perhaps tracing shouldn't be the only tool in our toolbox :)