On Mon 01-05-23 09:54:10, Suren Baghdasaryan wrote: > Memory allocation profiling infrastructure provides a low overhead > mechanism to make all kernel allocations in the system visible. It can be > used to monitor memory usage, track memory hotspots, detect memory leaks, > identify memory regressions. > > To keep the overhead to the minimum, we record only allocation sizes for > every allocation in the codebase. With that information, if users are > interested in more detailed context for a specific allocation, they can > enable in-depth context tracking, which includes capturing the pid, tgid, > task name, allocation size, timestamp and call stack for every allocation > at the specified code location. [...] > Implementation utilizes a more generic concept of code tagging, introduced > as part of this patchset. Code tag is a structure identifying a specific > location in the source code which is generated at compile time and can be > embedded in an application-specific structure. A number of applications > for code tagging have been presented in the original RFC [1]. > Code tagging uses the old trick of "define a special elf section for > objects of a given type so that we can iterate over them at runtime" and > creates a proper library for it. > > To profile memory allocations, we instrument page, slab and percpu > allocators to record total memory allocated in the associated code tag at > every allocation in the codebase. Every time an allocation is performed by > an instrumented allocator, the code tag at that location increments its > counter by allocation size. Every time the memory is freed the counter is > decremented. To decrement the counter upon freeing, allocated object needs > a reference to its code tag. Page allocators use page_ext to record this > reference while slab allocators use memcg_data (renamed into more generic > slabobj_ext) of the slab page. [...] > [1] https://lore.kernel.org/all/20220830214919.53220-1-surenb@xxxxxxxxxx/ [...] > 70 files changed, 2765 insertions(+), 554 deletions(-) Sorry for cutting the cover considerably but I believe I have quoted the most important/interesting parts here. The approach is not fundamentally different from the previous version [1] and there was a significant discussion around this approach. The cover letter doesn't summarize nor deal with concerns expressed previous AFAICS. So let me bring those up back. At least those I find the most important: - This is a big change and it adds a significant maintenance burden because each allocation entry point needs to be handled specifically. The cost will grow with the intended coverage especially there when allocation is hidden in a library code. - It has been brought up that this is duplicating functionality already available via existing tracing infrastructure. You should make it very clear why that is not suitable for the job - We already have page_owner infrastructure that provides allocation tracking data. Why it cannot be used/extended? Thanks! -- Michal Hocko SUSE Labs