Hi all, I need to send trace data from in-kernel eBPF programs to user space. I have two choices, the first is storing traces in a BPF_MAP_TYPE_ARRAY map, and the user space consumer pulls traces from it. The second is storing traces in a BPF_MAP_TYPE_PROG_ARRAY map using bpf_perf_event_output(), and the user space consumer reads traces in the callback function invoked by perf_buffer__poll(). I did simple performance tests and found the two methods have significant performance difference. For the first method (pulling from a BPF_MAP_TYPE_ARRAY map), I set up an array map of 65536 entries and 40 bytes value size, then pull entries in user spcae for 10M times, and it costs 3.7s in average. For the second method, I allocate 2MB memory for the perf buffer of each CPU. The user space consumer calls perf_buffer__poll() in an infinite loop. To generate enough traces, I attach an ebpf program at the sys_enter_read tracepoint which will generate 100 traces in an execution, and run a user space program to call the read() system call in an infinite loop to trigger the ebpf program. The result is, it takes 10+ seconds to get 10M traces using perf_buffer__poll(), which is much slower than polling the array. This blog (https://nakryiko.com/posts/bpf-ringbuf/) says that bpf perf buffer has the ability to efficiently read data from user-space through memory-mapped region without extra memory copying and/or syscalls into the kernel, so I though it would be faster than reading the array map, which needs to invoke the bpf() system call. But my test gives the opposite result. I run this test on a server with 48 CPU cores and 188GB memory. The OS is Ubuntu 20.04 with kernel version 5.4.0. I wonder is this result as expected, or did I overlook something? Thank you for your help! Chang Liu Tsinghua University, China