On Thu, 1 Sep 2022 17:54:38 -0400 Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > So this looks like it's gotten better since I last looked, but it's still not > there yet. > > Part of the problem is that the tracepoints themselves are in the wrong place: > your end event is when a task is woken up, but that means spurious wakeups will The end event is when a task is scheduled onto the CPU. The start event is the first time it is woken up. > cause one wait_event() call to be reported as multiple smaller waits, not one > long wait - oops, now I can't actually find the thing that's causing my > multi-second delay. > > Also, in your example you don't have it broken out by callsite. That would be > the first thing I'd need for any real world debugging. OK, how about this (currently we can only have 3 keys, but you can create multiple histograms on the same event). # echo 'hist:keys=comm,stacktrace,delta.buckets=10:sort=delta' > /sys/kernel/tracing/events/synthetic/wakeup_lat/trigger (notice the "stacktrace" in the keys) # cat /sys/kernel/tracing/events/synthetic/wakeup_lat/hist # event histogram # # trigger info: hist:keys=comm,stacktrace,delta.buckets=10:vals=hitcount:sort=delta.buckets=10:size=2048 [active] # { comm: migration/2 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/5 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/1 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/7 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/0 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 start_kernel+0x595/0x5be secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: migration/4 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: rtkit-daemon , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 preempt_schedule_common+0x2d/0x70 preempt_schedule_thunk+0x16/0x18 _raw_spin_unlock_irq+0x2e/0x40 eventfd_write+0xc8/0x290 vfs_write+0xc0/0x2a0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb , delta: ~ 10-19} hitcount: 1 { comm: migration/6 , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 10-19} hitcount: 7 { comm: rtkit-daemon , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 20-29} hitcount: 1 { comm: rtkit-daemon , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 preempt_schedule_common+0x2d/0x70 preempt_schedule_thunk+0x16/0x18 _raw_spin_unlock_irq+0x2e/0x40 eventfd_write+0xc8/0x290 vfs_write+0xc0/0x2a0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb , delta: ~ 30-39} hitcount: 1 { comm: rtkit-daemon , stacktrace: event_hist_trigger+0x290/0x2b0 event_triggers_call+0x52/0xe0 trace_event_buffer_commit+0x193/0x240 trace_event_raw_event_sched_switch+0x120/0x180 __traceiter_sched_switch+0x39/0x50 __schedule+0x310/0x700 schedule_idle+0x26/0x40 do_idle+0xb4/0xd0 cpu_startup_entry+0x19/0x20 secondary_startup_64_no_verify+0xc2/0xcb , delta: ~ 40-49} hitcount: 1 Totals: Hits: 53 Entries: 11 Dropped: 0 Not the prettiest thing to read. But hey, we got the full stack of where these latencies happened! Yes, it adds some overhead when the events are triggered due to the stacktrace code, but it's extremely useful information. > > So, it looks like tracing has made some progress over the past 10 years, > but for debugging latency issues it's still not there yet in general. I I call BS on that statement. Just because you do not know what has been added to the kernel in the last 10 years (like you had no idea about seq_buf and that was added in 2014) means to me that you are totally clueless on what tracing can and can not do. It appears to me that you are too focused on inventing your own wheel that does exactly what you want before looking to see how things are today. Just because something didn't fit your needs 10 years ago doesn't mean that it can't fit your needs today. > will definitely remember function latency tracing the next time I'm doing > performance work, but I expect that to be far too heavy to enable on a > live server. I run it on production machines all the time. With the filtering in place it has very little overhead. Mostly in the noise. The best part is that it has practically zero overhead (but can add some cache pressure) when it's off, and can be turned on at run time. The tracing infrastructure is very modular, you can use parts of it that you need, without the overhead of other parts. Like you found out this week that tracepoints are not the same as trace events. Because tracepoints are just a hook in the code that anything can attach to (that's what Daniel's RV work does). Trace events provide the stored data to be recorded. I will note that the current histogram code overhead has increased due to retpolines, but I have code to convert them from indirect calls to direct calls via a switch statement which drops the overhead by 20%! https://lore.kernel.org/all/20220823214606.344269352@xxxxxxxxxxx/ > > This thing is only a couple hundred lines of code though, so perhaps > tracing shouldn't be the only tool in our toolbox :) I'm already getting complaints from customers/users that are saying there's too many tools in the toolbox already. (Do we use ftrace/perf/bpf?). The idea is to have the tools using mostly the same infrastructure, and not be 100% off on its own, unless there's a clear reason to invent a new wheel that several people are asking for, not just one or two. -- Steve