On 3/4/22 3:28 PM, Namhyung Kim wrote:
Hello, While I'm working on lock contention tracepoints [1] for a future BPF use, I found some issues on the stack trace in BPF programs. Maybe there are things that I missed but I'd like to share my thoughts for your feedback. So please correct me if I'm wrong. The first thing I found is how it handles skipped frames in the bpf_get_stack{,id}. Initially I wanted a short stack trace like 4 depth to identify callers quickly, but it turned out that 4 is not enough and it's all filled with the BPF code itself. So I set to skip 4 frames but it always returns an error (-EFAULT). After some time I figured out that BPF doesn't allow to set skip frames greater than or equal to buffer size. This seems strange and looks like a bug. Then I found a bug report (and a partial fix) [2] and work on a full fix now.
Thanks for volunteering. Looking forward to the patch.
But it revealed another problem with BPF programs on perf_event which use a variant of stack trace functions. The difference is that it needs to use a callchain in the perf sample data. The perf callchain is saved from the begining while BPF callchain is saved at the last to limit the stack depth by the buffer size. But I can handle that. More important thing to me is the content of the (perf) callchain. If the event has __PERF_SAMPLE_CALLCHAIN_EARLY, it will have context info like PERF_CONTEXT_KERNEL. So user might or might not see it depending on whether the perf_event set with precise_ip and SAMPLE_CALLCHAIN. This doesn't look good.
Patch 7b04d6d60fcf ("bpf: Separate bpf_get_[stack|stackid] for perf events BPF") tried to fix __PERF_SAMPLE_CALLCHAIN_EARLY issue for bpf_get_stack[id]() helpers. The helpers will check whether event->attr.sample_type has __PERF_SAMPLE_CALLCHAIN_EARLY encoded or not, based on which the stacks will be retrieved accordingly. Did you any issue here?
After all, I think it'd be really great if we can skip those uninteresting info easily. Maybe we could add a flag to skip BPF code
We cannot just skip those callchains with __PERF_SAMPLE_CALLCHAIN_EARLY. There are real use cases for it.
perf context, and even some scheduler code from the trace respectively like in stack_trace_consume_entry_nosched().
A flag for the bpf_get_stack[id]() helpers? It is possible. It would be great if you can detail your use case here and how a flag could help you.
Thoughts? Thanks, Namhyung [1] https://lore.kernel.org/all/20220301010412.431299-1-namhyung@xxxxxxxxxx/ [2] https://lore.kernel.org/bpf/30a7b5d5-6726-1cc2-eaee-8da2828a9a9c@xxxxxxxxxx/