On Sun, 4 Sep 2022 18:32:58 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > Page allocations (overheads are compared to get_free_pages() duration): > 6.8% Codetag counter manipulations (__lazy_percpu_counter_add + __alloc_tag_add) > 8.8% lookup_page_ext > 1237% call stack capture > 139% tracepoint with attached empty BPF program Have you tried tracepoint with custom callback? static void my_callback(void *data, unsigned long call_site, const void *ptr, struct kmem_cache *s, size_t bytes_req, size_t bytes_alloc, gfp_t gfp_flags) { struct my_data_struct *my_data = data; { do whatever } } [..] register_trace_kmem_alloc(my_callback, my_data); Now the my_callback function will be called directly every time the kmem_alloc tracepoint is hit. This avoids that perf and BPF overhead. -- Steve