Hello, I'd like to discuss a few BPF issues of perf tools (and the kernel). The perf tools already make use of BPF programs for various tracing and filtering work. While these are all great, there is still room for improvement. 1. Allowing unprivileged access to BPF for perf events. The perf_event subsystem allows non-root (!CAP_PERFMON) users to have events with restrictions in order to measure performance counts for their processes. On the other hand, the BPF event filter [1] can be used to accept or reject samples based on the content of the sample. It's almost the same as the classic BPF socket filter. But without CAP_BPF, normal users cannot use the BPF filter for their perf events. I noticed there's ongoing work with the BPF token for unprivileged use cases but it seems to focus on “trusted” container use cases, and I'm not sure if this would fit well for the perf use case. Note that this case would need to allow random users and therefore, needs limited functionality to access the given sample data only. 2. Enhancing stack trace Sometimes it can fail to get build-ID and offset for user stack traces because of mmap_lock contention. As BPF programs can run in atomic context, it cannot wait for the lock to get the build-ID and offset. Also there’s a chance to get page faults in the user page which also makes the stack trace stop. I wonder if we can enhance this situation using the deferred stack trace proposed for S-Frame [2] last year. IIUC it wasn’t designed for BPF in mind but I think it can be useful for stack trace with FP. Also it would be able to avoid duplication of the same user stacks if the process runs in the kernel context for a while. The question is how to defer and to connect them. Another (minor) issue with stack trace is to add one more (missing) helper. IIUC are 3 stack trace helpers: bpf_get_stack(), bpf_get_stackid() and bpf_get_task_stack(). But I find that it'd be useful if there's a helper (bpf_get_task_stackid) to return a single ID value for a stack trace of the given task. My use case is perf lock contention tool [3] to get the stack trace of the owner of contended mutexes. Currently it just returns the TID of the owner, but it'd be nice to get the stack trace directly when it went to sleep. 3. Lock symbol improvements Actually this is not specific to BPF but for general tracing. As I said ‘perf lock contention’ uses BPF on a couple of tracepoints to track lock contentions in the kernel. But one of the problems is that there's no symbol information for the lock. While the lockdep saves it in the lock data structure, it's not allowed to do that in production. As the tracepoint has the address of the lock instance, it can check kallsyms for global locks but dynamic locks are not handled. Currently it blindly tries to match the address with some well-known locks (including mmap_lock) from the task struct or global per-cpu symbols in BPF. I'm curious if there's a better way to do it. I was thinking about BPF iterators to get the address of well-known locks but it cannot handle all cases and might be racy. Looking forward to more discussion on the perf and tracing topic. Thanks, Namhyung [1] https://lore.kernel.org/r/20230314234237.3008956-1-namhyung@xxxxxxxxxx/ [2] https://lore.kernel.org/r/d5def69b0c88bcbe2a85d0e1fd6cfca62b472ed4.1699487758.git.jpoimboe@xxxxxxxxxx/ [3] https://lore.kernel.org/r/20230207002403.63590-1-namhyung@xxxxxxxxxx/