On Thu 01-09-22 16:15:02, Kent Overstreet wrote: > On Thu, Sep 01, 2022 at 12:39:11PM -0700, Suren Baghdasaryan wrote: > > kmemleak is known to be slow and it's even documented [1], so I hope I > > can skip that part. For page_owner to provide the comparable > > information we would have to capture the call stacks for all page > > allocations unlike our proposal which allows to do that selectively > > for specific call sites. I'll post the overhead numbers of call stack > > capturing once I'm finished with profiling the latest code, hopefully > > sometime tomorrow, in the worst case after the long weekend. > > To expand on this further: we're stashing a pointer to the alloc_tag, which is > defined at the allocation callsite. That's how we're able to decrement the > proper counter on free, and why this beats any tracing based approach - with > tracing you'd instead have to correlate allocate/free events. Ouch. > > > > Yes, tracking back the call trace would be really needed. The question > > > is whether this is really prohibitively expensive. How much overhead are > > > we talking about? There is no free lunch here, really. You either have > > > the overhead during runtime when the feature is used or on the source > > > code level for all the future development (with a maze of macros and > > > wrappers). > > The full call stack is really not what you want in most applications - that's > what people think they want at first, and why page_owner works the way it does, > but it turns out that then combining all the different but related stack traces > _sucks_ (so why were you saving them in the first place?), and then you have to > do a separate memory allocate for each stack track, which destroys performance. I do agree that the full stack trace is likely not what you need. But the portion of the stack that you need is not really clear because the relevant part might be on a different level of the calltrace depending on the allocation site. Take this as an example: {traverse, seq_read_iter, single_open_size}->seq_buf_alloc -> kvmalloc -> kmalloc This whole part of the stack is not really all that interesting and you would have to allocate pretty high at the API layer to catch something useful. And please remember that seq_file interface is heavily used in throughout the kernel. I wouldn't suspect seq_file itself to be buggy, that is well exercised code but its users can botch things and that is where the leak would happen. There are many other examples like that where the allocation is done at a lib/infrastructure layer (sysfs framework, mempools network pool allocators and whatnot). We do care about those users, really. Ad-hoc pool allocators built on top of the core MM allocators are not really uncommon. And I am really skeptical we really want to add all the tagging source code level changes to each and every one of them. This is really my main concern about this whole work. Not only it adds a considerable maintenance burden to the core MM because it adds on top of our existing allocator layers complexity but it would need to spread beyond MM to be useful because it is usually outside of MM where leaks happen. -- Michal Hocko SUSE Labs