On Wed, Mar 30, 2022 at 11:22:32AM -0700, Alexei Starovoitov wrote: > On Wed, Mar 30, 2022 at 9:34 AM Beau Belgrave <beaub@xxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > > But you are fine with uprobe costs? uprobes appear to be much more costly > > > > than a syscall approach on the hardware I've run on. > > Care to share the numbers? > uprobe over USDT is a single trap. > Not much slower compared to syscall with kpti. > Sure, these are the numbers we have from a production device. They are captured via perf via PERF_COUNT_HW_CPU_CYCLES. It's running a 20K loop emitting 4 bytes of data out. Each 4 byte event time is recorded via perf. At the end we have the total time and the max seen. null numbers represent a 20K loop with just perf start/stop ioctl costs. null: min=2863, avg=2953, max=30815 uprobe: min=10994, avg=11376, max=146682 uevent: min=7043, avg=7320, max=95396 lttng: min=6270, avg=6508, max=41951 These costs include the data getting into a buffer, so they represent what we would see in production vs the trap cost alone. For uprobe this means we created a uprobe and attached it via tracefs to get the above numbers. There also seems to be some thinking around this as well from Song Liu. Link: https://lore.kernel.org/lkml/20200801084721.1812607-1-songliubraving@xxxxxx/