Hi, apologies in advance if I have missed an obvious information resource. I've used BCC/eBPF before to generate a tool that attached to sched_switch with performance counters and that traces the per-scheduling thread quanta changes in counters for threads of interest. I'd like to use libbbpf to create similar functionality to something like ... perf stat -e LLC-loads,LLC-load-misses,LLC-stores,LLC-prefetches -p PID --per-thread The actual counters are immaterial - except that I expect to be sometimes using RAW performance counters, and I typically extract the perf_event_attr information such as event/config etc using perf stat -vvv -e counter-name /bin/ls I can see from the libbpf examples how to catch the pid of the command as it enters the runqueue, but I dont see how to initialise the raw hardware performance counters to only count the pid and its threads (is this even possible or does one just accumulate counters for each relevant thread in sched_switch - this is what we do currently). I understand how to put these values into a map. I'm looking for (if possible) example code or information pointers on how to initialise and read the counters, not using sampling - presumably using a BPF_PERF_ARRAY map and a call to the perf_counter_read or the older perf_read on the map. I'm only just getting up to speed with libbpf, so apologies if I've misunderstood and presented BCC ways that do not apply. Thanks, Andy