On Thu, 21 Mar 2024 09:36:22 -0700 Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote: > Low overhead [1] per-callsite memory allocation profiling. Not just for > debug kernels, overhead low enough to be deployed in production. > > Example output: > root@moria-kvm:~# sort -rn /proc/allocinfo > 127664128 31168 mm/page_ext.c:270 func:alloc_page_ext > 56373248 4737 mm/slub.c:2259 func:alloc_slab_page > 14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded > 14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash > 13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs > 11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio > 9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node > 4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable > 4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start > 3940352 962 mm/memory.c:4214 func:alloc_anon_folio > 2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node Did you consider adding a knob to permit all the data to be wiped out? So people can zap everything, run the chosen workload then go see what happened? Of course, this can be done in userspace by taking a snapshot before and after, then crunching on the two....