On Sun, Apr 03, 2022 at 08:56:18AM +0800, Hillf Danton wrote: > On Sat, 2 Apr 2022 10:54:36 -0700 Roman Gushchin wrote: > > Hello Hillf! > > > Hello Roman, > > > Thank you for sharing it, really interesting! I=E2=80=99m actually working o= > > n the same problem.=20 > > Good to know you have some interest in it. > Feel free to let me know you would like to take it over to avoid > repeated works on both sides. > > > > > No code to share yet, but here are some of my thoughts: > > 1) If there is a =E2=80=9Cnatural=E2=80=9D memory pressure, no additional sl= > > ab scanning is needed. > > Agree - the periodic shrinker can be canceled once kswapd wakes up. I think we should be waking up per-node kswapd to do the periodic shrinking, not adding yet another way of executing (thousands of) shrinkers (across hundreds of nodes) from a single threaded context. Indeed, the problem of "no reclaim when there is no memory pressure" also affects the page cache, not just shrinkable caches and we really don't want periodic reclaim to have a compeltely different behaviour to general memory reclaim. i.e. the amount of work that shrinkers need to do in a periodic scan is largerly determined by the rate of shrinkable cache memory usage growth rather than memory reclaim priority as it is now. Hence there needs to be different high level "shrinker needs to do X amount of work" calculation for periodic reclaim than there is now. e.g. we calculate a rolling average of the size of the cache and a rate of change over a series of polling operations (i.e. calling ->scan_count) and then when sustained growth is detected we start trying to shrink the cache to limit the rate of growth of the cache. If the cache keeps growing, then it's objects are being repeatedly referenced and it *should* keep growing. If it's one-off objects that are causing the growth of the cache and so objects are being reclaimed by the shrinker, then matching the periodic shrink scan to the growth rate will substantially reduce the rate of growth of that cache. And if it's integrated into the existing do_shrink_slab calculations, the moment actual memory reclaim calls the shrinker the periodic scan calculations can be reset back to zero and it starts again... > > 2) =46rom a power perspective it=E2=80=99s better to scan more at once, but l= > > ess often. > > The shrinker proposed is a catapult on the vmscan side without knowing > where the cold slab objects are piling up in Dave's backyard but he is > free to take different actions than the regular shrinker - IOW this > shrinker alone does not make much sense wrt shooting six birds without > the stone on the slab owner side. > > It is currently scanning *every* slab cache at an arbitrary frequency, > once 30 seconds - I am open to a minute or whatever. Sorry, I don't understand what "Dave's backyard" is or why it would ever need to be special cased? > > 3) Maybe we need a feedback loop with the slab allocator: e.g. if slabs are a= > > lmost full there is more sense to do a proactive scanning and free up some m= > > emory, otherwise we=E2=80=99ll end up allocating more slabs. But it=E2=80=99= > > s tricky. > > There are 31 bits available in the periodic flag added to shrink control. > > > 4) If the scanning is not resulting in any memory reclaim, maybe we should (= > > temporarily) exclude the corresponding shrinker from the scanning. > > Given the periodic flag, Dave is free to ignore the scan request and the > scan result is currently dropped on the vmscan side because what is > considered is the cold slab objects that for instance have been inactive > for more than 30 seconds in every slab cache, rather than kswapd's cake. I don't understand how passing a "periodic" flag to individual shrinkers is really useful here. How does the shrinker implementation use this to determine how much work it needs to do? i.e. The amount of work a shrinker needs to perform is calculated by the high level slab scanning code based on relative cache size and reclaim priority. If there's a periodic scanner, it should be calculating a specific amount of work for the shrinkers to do way up in do_shrink_slab() and then asking the shrinker to perform that work in exactly the way it does now - the shrinker itself doesn't need to know anything about whether it's a periodic memory reclaim scan or whether there's actual memory pressure - it just needs to scan the oldest objects in it's cache to try to reclaim them. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx