On Mon, Jun 24, 2024 at 05:57:51AM GMT, Yosry Ahmed wrote: > > > and I will explain why below. I know it may be a necessary > > > evil, but I would like us to make sure there is no other option before > > > going forward with this. > > > > Instead of necessary evil, I would call it a pragmatic approach i.e. > > resolve the ongoing pain with good enough solution and work on long term > > solution later. > > It seems like there are a few ideas for solutions that may address > longer-term concerns, let's make sure we try those out first before we > fall back to the short-term mitigation. > Why? More specifically why try out other things before this patch? Both can be done in parallel. This patch has been running in production at Meta for several weeks without issues. Also I don't see how merging this would impact us on working on long term solutions. [...] > > Thanks for explaining this in such detail. It does make me feel > better, but keep in mind that the above heuristics may change in the > future and become more sensitive to stale stats, and very likely no > one will remember that we decided that stale stats are fine > previously. > When was the last time this heuristic change? This heuristic was introduced in 2008 for anon pages and extended to file pages in 2016. In 2019 the ratio enforcement at 'reclaim root' was introduce. I am pretty sure we will improve the whole rstat flushing thing within a year or so :P > > > > For the cache trim mode, inactive file LRU size is read and the kernel > > scales it down based on the reclaim iteration (file >> sc->priority) and > > only checks if it is zero or not. Again precise information is not > > needed. > > It sounds like it is possible that we enter the cache trim mode when > we shouldn't if the stats are stale. Couldn't this lead to > over-reclaiming file memory? > Can you explain how this over-reclaiming file will happen? [...] > > > > > > - Try to figure out if one (or a few) update paths are regressing all > > > flushers. If one specific stat or stats update path is causing most of > > > the updates, we can try to fix that instead. Especially if it's a > > > counter that is continuously being increased and decreases (so the net > > > change is not as high as we think). > > > > This is actually a good point. I remember Jasper telling that MEMCG_KMEM > > might be the one with most updates. I can try to collect from Meta fleet > > what is the cause of most updates. > > Let's also wait and see what comes out of this. It would be > interesting if we can fix this on the update side instead. > Yes it would be interesting but I don't see any reason to wait for it. > > > > > > > > At the end of the day, all of the above may not work, and we may have > > > to live with just using the ratelimited approach. But I *really* hope > > > we could actually go the other way. Fix things on a more fundamental > > > level and eventually drop the ratelimited variants completely. > > > > > > Just my 2c. Sorry for the long email :) > > > > Please note that this is not some user API which can not be changed > > later. We can change and disect however we want. My only point is not to > > wait for the perfect solution and have some intermediate and good enough > > solution. > > I agree that we shouldn't wait for a perfect solution, but it also > seems like there are a few easy-ish solutions that we can discover > first (Jesper's patch, investigating update paths, etc). If none of > those pan out, we can fall back to the ratelimited flush, ideally with > a plan on next steps for a longer-term solution. I think I already explain why there is no need to wait. One thing we should agree on is that this is hard problem and will need multiple iterations to comeup with a solution which is acceptable for most. Until then I don't see any reason to block mitigations to reduce pain. thanks, Shakeel