On Mon, Sep 05, 2022 at 11:08:21AM -0700, Suren Baghdasaryan wrote: > On Mon, Sep 5, 2022 at 8:06 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote: > > > > On Sun, 4 Sep 2022 18:32:58 -0700 > > Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > > > > > Page allocations (overheads are compared to get_free_pages() duration): > > > 6.8% Codetag counter manipulations (__lazy_percpu_counter_add + __alloc_tag_add) > > > 8.8% lookup_page_ext > > > 1237% call stack capture > > > 139% tracepoint with attached empty BPF program > > > > Have you tried tracepoint with custom callback? > > > > static void my_callback(void *data, unsigned long call_site, > > const void *ptr, struct kmem_cache *s, > > size_t bytes_req, size_t bytes_alloc, > > gfp_t gfp_flags) > > { > > struct my_data_struct *my_data = data; > > > > { do whatever } > > } > > > > [..] > > register_trace_kmem_alloc(my_callback, my_data); > > > > Now the my_callback function will be called directly every time the > > kmem_alloc tracepoint is hit. > > > > This avoids that perf and BPF overhead. > > Haven't tried that yet but will do. Thanks for the reference code! Is it really worth the effort of benchmarking tracing API overhead here? The main cost of a tracing based approach is going to to be the data structure for remembering outstanding allocations so that free events can be matched to the appropriate callsite. Regardless of whether it's done with BFP or by attaching to the tracepoints directly, that's going to be the main overhead.