On Wed, May 10, 2023 at 12:28 PM Kent Overstreet <kent.overstreet@xxxxxxxxx> wrote: > > On Tue, Mar 28, 2023 at 06:28:21PM +0200, Vlastimil Babka wrote: > > On 2/22/23 20:31, Suren Baghdasaryan wrote: > > > We would like to continue the discussion about code tagging use for > > > memory allocation profiling. The code tagging framework [1] and its > > > applications were posted as an RFC [2] and discussed at LPC 2022. It > > > has many applications proposed in the RFC but we would like to focus > > > on its application for memory profiling. It can be used as a > > > low-overhead solution to track memory leaks, rank memory consumers by > > > the amount of memory they use, identify memory allocation hot paths > > > and possible other use cases. > > > Kent Overstreet and I worked on simplifying the solution, minimizing > > > the overhead and implementing features requested during RFC review. > > > > IIRC one large objection was the use of page_ext, I don't recall if you > > found another solution to that? > > Hasn't been addressed yet, but we were just talking about moving the > codetag pointer from page_ext to page last night for memory overhead > reasons. > > The disadvantage then is that the memory overhead doesn't go down if you > disable memory allocation profiling at boot time... > > But perhaps the performance overhead is low enough now that this is not > something we expect to be doing as much? > > Choices, choices... I would like to participate in this discussion, specifically to discuss how to make this profiling applicable at the scale environment. Where we have many machines in the fleet, but the memory and performance overheads must be much smaller compared to what is currently proposed. There are several ideas that we can discuss: 1. Filtering files that are going to be tagged at the build time. For example, If a specific driver does not need to be tagged it can be filtered out during build time. 2. Reducing the memory overhead by not using page_ext pointer, but instead use n-bits in the page->flags. The number of buckets is actually not that large, there is no need to keep 8-byte pointer in page_ext, it could be an idx in an array of a specific size. There could be buckets that contain several stacks. 3. Using static branches for performance optimizations, especially for the cases when profiling is disabled. 4. Optionally enable only a specific allocator profiling: kmalloc/pgalloc/vmalloc/pcp etc. Pasha