----- On Mar 30, 2022, at 3:15 PM, Beau Belgrave beaub@xxxxxxxxxxxxxxxxxxx wrote: > On Wed, Mar 30, 2022 at 11:22:32AM -0700, Alexei Starovoitov wrote: >> On Wed, Mar 30, 2022 at 9:34 AM Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: >> > > > >> > > > But you are fine with uprobe costs? uprobes appear to be much more costly >> > > > than a syscall approach on the hardware I've run on. >> >> Care to share the numbers? >> uprobe over USDT is a single trap. >> Not much slower compared to syscall with kpti. >> > > Sure, these are the numbers we have from a production device. > > They are captured via perf via PERF_COUNT_HW_CPU_CYCLES. > It's running a 20K loop emitting 4 bytes of data out. > Each 4 byte event time is recorded via perf. > At the end we have the total time and the max seen. > > null numbers represent a 20K loop with just perf start/stop ioctl costs. > > null: min=2863, avg=2953, max=30815 > uprobe: min=10994, avg=11376, max=146682 > uevent: min=7043, avg=7320, max=95396 > lttng: min=6270, avg=6508, max=41951 > > These costs include the data getting into a buffer, so they represent > what we would see in production vs the trap cost alone. For uprobe this > means we created a uprobe and attached it via tracefs to get the above > numbers. [...] I assume here that by "lttng" you specifically refer to lttng-ust (LTTng's user-space tracer), am I correct ? By removing the "null" baseline overhead, my rough calculations are that the average overhead for lttng-ust in your results is (in cpu cycles): 6270-2863 = 3555 So I'm unsure what is the frequency of your CPU, but guessing around 3.5GHz this is in the area of 1 microsecond. On an Intel CPU, this is much larger than what I would expect. Can you share your test program, hardware characteristics, kernel version, glibc version, and whether the program is compiled as a 32-bit or 64-bit binary ? Can you confirm that lttng-ust is not calling one getcpu system call per event ? This might be the case if run a 32-bit x86 binary and have a glibc < 2.35, or a kernel too old to provide CONFIG_RSEQ or don't have CONFIG_RSEQ=y in your kernel configuration. You can validate this by running your lttng-ust test program with a system call tracer. Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com