On Tue, Apr 26, 2022 at 04:02:19PM +1000, Dave Chinner wrote: > This just seems like a solution looking for a problem to solve. > Can you please describe the problem this infrastructure is going > to solve? A point I was making over VC is that memcg is completely irrelevant to debugging most of these issues; all the issues we've been talking about can be easily reproduced in a single test VM without memcg. Yet we don't even have the tooling to debug the simple stuff. Why are we trying to make big and complicated stuff when we can't even debug the simple cases? And I've been getting _really_ tired of the stock answer of "that use case isn't interesting to the big cloud providers". A: If you're a Linux kernel developer at this level, you have earned a great deal of trust and it is incumbent upon you to be a good steward of the code you have been entrusted with, instead of just spending all your time chasing fat bonuses from your employer while ignoring what's good for the codebase as a whole. That's pissing all over the commons that came long before you and will hopefully still be around long after you. B: Even aside from that, it's incredibly shortsighted and a poor use of time and resources. When I was at Google I saw, over and over again, people rushing to do something big and complicated and new because that was how they could get a promotion, instead of working on basic stuff like refactoring core IO paths (and it's been my experience over and over again that when you just try to make code saner and more understandable, you almost always find big performance improvements along the way... but that's not as exciting as rushing to find the biggest coolest optimization or all-the-bells-and-whistles interface). So yeah, this patchset screams of someone looking for a promotion to me. Meanwhile, the status of visibility into the _basics_ of what goes on in MM is utter dogshit. There's just too many _basic_ questions that are a pain in the ass to answer - even just profiling memory usage by file:line number is a shitshow. One thing that I run into a lot is people rush to say "tracepoints!" for a lot of problems - but tracepoints aren't a good answer for a lot of problems because having them on all the time is problematic. What I would like to see is more lighter weight collection of statistics, and some basic library code for things like latency measurements of important operations broken out by quantiles, with rate & frequence - this is something that's helped in bcachefs. If anyone's interested, the code for that starts here: https://evilpiepirate.org/git/bcachefs.git/tree/fs/bcachefs/bcachefs.h#n322 Specifically for shrinkers, I'd like if we had rolling averages over the past few seconds for e.g. _rate_ of objects requested to be freed vs. actually freed. If we collect those kinds of rate measurements (and perhaps latency too, to show stalls) at various places in the MM code, perhaps we'd be able to see what's getting stuck when we OOM. We should have rate of objects getting added, too, and we should be collecting data from the list_lru code as well, like you were mentioning the other night. And if we collect this data in such a way that it can be displayed in sysfs, but done with the to_text() methods I've been talking about, it'll also be trivial to include that in the show_mem() report when we OOM. Anyways, that's my two cents.... I can't claim to have any brilliant insights here, but I hope Roman will start taking ideas from more people (and Dave's been a real wealth of information on this topic! I'd pick his brain if I were you, Roman).