BTW, if anyone's interested, there's a "benchmark" trace event when you enable: CONFIG_TRACEPOINT_BENCHMARK=y You can see the code in: kernel/trace/trace_benchmark.c: Which does a loop of: local_irq_disable(); start = trace_clock_local(); trace_benchmark_event(bm_str, bm_last); stop = trace_clock_local(); local_irq_enable(); Where it writes the result of the previous timings into the current trace event via the bm_str: delta = stop - start; [..] bm_last = delta; [..] scnprintf(bm_str, BENCHMARK_EVENT_STRLEN, "last=%llu first=%llu max=%llu min=%llu avg=%u std=%d std^2=%lld", bm_last, bm_first, bm_max, bm_min, avg, std, stddev); I ran: perf record -a -e benchmark:benchmark_event sleep 20 and perf script produces (I scrolled down to get to hot cache): event_benchmark 2289 [001] 672.581425: benchmark:benchmark_event: last=247 first=5693 max=8969 min=204 avg=240 std=234 std^2=55157 delta=247 event_benchmark 2289 [001] 672.581426: benchmark:benchmark_event: last=222 first=5693 max=8969 min=204 avg=240 std=234 std^2=55151 delta=222 event_benchmark 2289 [001] 672.581427: benchmark:benchmark_event: last=229 first=5693 max=8969 min=204 avg=240 std=234 std^2=55144 delta=229 event_benchmark 2289 [001] 672.581427: benchmark:benchmark_event: last=221 first=5693 max=8969 min=204 avg=240 std=234 std^2=55138 delta=221 event_benchmark 2289 [001] 672.581428: benchmark:benchmark_event: last=223 first=5693 max=8969 min=204 avg=240 std=234 std^2=55131 delta=223 event_benchmark 2289 [001] 672.581428: benchmark:benchmark_event: last=220 first=5693 max=8969 min=204 avg=240 std=234 std^2=55125 delta=220 event_benchmark 2289 [001] 672.581429: benchmark:benchmark_event: last=215 first=5693 max=8969 min=204 avg=240 std=234 std^2=55118 delta=215 event_benchmark 2289 [001] 672.581430: benchmark:benchmark_event: last=221 first=5693 max=8969 min=204 avg=240 std=234 std^2=55112 delta=221 event_benchmark 2289 [001] 672.581430: benchmark:benchmark_event: last=240 first=5693 max=8969 min=204 avg=240 std=234 std^2=55105 delta=240 event_benchmark 2289 [001] 672.581431: benchmark:benchmark_event: last=225 first=5693 max=8969 min=204 avg=240 std=234 std^2=55099 delta=225 event_benchmark 2289 [001] 672.581432: benchmark:benchmark_event: last=235 first=5693 max=8969 min=204 avg=240 std=234 std^2=55092 delta=235 event_benchmark 2289 [001] 672.581432: benchmark:benchmark_event: last=220 first=5693 max=8969 min=204 avg=240 std=234 std^2=55086 delta=220 event_benchmark 2289 [001] 672.581433: benchmark:benchmark_event: last=245 first=5693 max=8969 min=204 avg=240 std=234 std^2=55079 delta=245 event_benchmark 2289 [001] 672.581433: benchmark:benchmark_event: last=215 first=5693 max=8969 min=204 avg=240 std=234 std^2=55073 delta=215 event_benchmark 2289 [001] 672.581434: benchmark:benchmark_event: last=216 first=5693 max=8969 min=204 avg=240 std=234 std^2=55066 delta=216 For ftrace: trace-cmd record -e benchmark_event sleep 20 trace-cmd report: event_benchmark-2253 [000] 549.747068: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=78 event_benchmark-2253 [000] 549.747069: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=79 event_benchmark-2253 [000] 549.747069: benchmark_event: last=72 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=72 event_benchmark-2253 [000] 549.747069: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=79 event_benchmark-2253 [000] 549.747070: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=78 event_benchmark-2253 [000] 549.747070: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=78 event_benchmark-2253 [000] 549.747071: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=79 event_benchmark-2253 [000] 549.747071: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=78 event_benchmark-2253 [000] 549.747072: benchmark_event: last=80 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=80 event_benchmark-2253 [000] 549.747072: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=79 event_benchmark-2253 [000] 549.747073: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=933 delta=78 event_benchmark-2253 [000] 549.747073: benchmark_event: last=165 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=165 event_benchmark-2253 [000] 549.747074: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=79 event_benchmark-2253 [000] 549.747074: benchmark_event: last=153 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=153 event_benchmark-2253 [000] 549.747075: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=78 event_benchmark-2253 [000] 549.747075: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=78 event_benchmark-2253 [000] 549.747076: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=935 delta=78 event_benchmark-2253 [000] 549.747076: benchmark_event: last=73 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=73 event_benchmark-2253 [000] 549.747077: benchmark_event: last=79 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=79 event_benchmark-2253 [000] 549.747077: benchmark_event: last=78 first=2674 max=1185 min=71 avg=84 std=30 std^2=934 delta=78 For normal tracing, the average perf event takes 204ns per event, and the average ftrace event takes 84ns per event. The "first" in the output above is how long the first event took (which was cold cache). I added filtering to trace-cmd with: trace-cmd record -o trace-filter.dat -e benchmark_event -f 'delta & 1' sleep 20 I should modify the event to have a counter so that I can filter every other event with that, but for now I just print out anything that has an odd delta. event_benchmark-2548 [000] 1558.776493: benchmark_event: str=last=199 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=199 event_benchmark-2548 [000] 1558.776498: benchmark_event: str=last=43 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=43 event_benchmark-2548 [000] 1558.776498: benchmark_event: str=last=191 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=191 event_benchmark-2548 [000] 1558.776500: benchmark_event: str=last=41 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=41 event_benchmark-2548 [000] 1558.776500: benchmark_event: str=last=119 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=119 event_benchmark-2548 [000] 1558.776501: benchmark_event: str=last=41 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=41 event_benchmark-2548 [000] 1558.776502: benchmark_event: str=last=105 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=105 event_benchmark-2548 [000] 1558.776503: benchmark_event: str=last=41 first=1964 max=2215 min=40 avg=78 std=44 std^2=2022 delta=41 event_benchmark-2548 [000] 1558.776505: benchmark_event: str=last=41 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=41 event_benchmark-2548 [000] 1558.776505: benchmark_event: str=last=111 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=111 event_benchmark-2548 [000] 1558.776506: benchmark_event: str=last=109 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=109 event_benchmark-2548 [000] 1558.776506: benchmark_event: str=last=109 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=109 event_benchmark-2548 [000] 1558.776508: benchmark_event: str=last=41 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=41 event_benchmark-2548 [000] 1558.776508: benchmark_event: str=last=109 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=109 event_benchmark-2548 [000] 1558.776509: benchmark_event: str=last=117 first=1964 max=2215 min=40 avg=78 std=44 std^2=2021 delta=117 event_benchmark-2548 [000] 1558.776510: benchmark_event: str=last=51 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=51 event_benchmark-2548 [000] 1558.776510: benchmark_event: str=last=103 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=103 event_benchmark-2548 [000] 1558.776511: benchmark_event: str=last=109 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=109 event_benchmark-2548 [000] 1558.776512: benchmark_event: str=last=51 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=51 event_benchmark-2548 [000] 1558.776512: benchmark_event: str=last=103 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=103 event_benchmark-2548 [000] 1558.776513: benchmark_event: str=last=95 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=95 event_benchmark-2548 [000] 1558.776513: benchmark_event: str=last=101 first=1964 max=2215 min=40 avg=78 std=44 std^2=2020 delta=101 event_benchmark-2548 [000] 1558.776514: benchmark_event: str=last=109 first=1964 max=2215 min=40 avg=78 std=44 std^2=2019 delta=109 It looks like throwing away an event is around 40-50ns, where as now the copying of the event into a temp buffer before writing it into the ring buffer increased the time from 84ns to around 100-110ns. Still half the time it takes for the perf event. The above trace event benchmark has been part of the Linux kernel since 3.16, so everyone should have it if you want to run your own tests. -- Steve