On Wed, Jul 05, 2017 at 01:59:12PM +0900, Minchan Kim wrote: > Hi Dave, > > On Wed, Jul 05, 2017 at 08:57:58AM +1000, Dave Chinner wrote: > > On Tue, Jul 04, 2017 at 09:21:37AM -0400, Josef Bacik wrote: > > > On Tue, Jul 04, 2017 at 12:01:00PM +0900, Minchan Kim wrote: > > > > 1. slab *page* reclaim > > > > > > > > Your claim is that it's hard to reclaim a page by slab fragmentation so need to > > > > reclaim objects more aggressively. > > > > > > > > Basically, aggressive scanning doesn't guarantee to reclaim a page but it just > > > > increases the possibility. Even, if we think slab works with merging feature(i.e., > > > > it mixes same size several type objects in a slab), the possibility will be huge > > > > dropped if you try to bail out on a certain shrinker. So for working well, > > > > we should increase aggressiveness too much to sweep every objects from all shrinker. > > > > I guess that's why your patch makes the logic very aggressive. > > > > In here, my concern with that aggressive is to reclaim all objects too early > > > > and it ends up making void caching scheme. I'm not sure it's gain in the end. > > > > > > > > > > Well the fact is what we have doesn't work, and I've been staring at this > > > problem for a few months and I don't have a better solution. > > > > > > And keep in mind we're talking about a purely slab workload, something that > > > isn't likely to be a common case. And even if our scan target is 2x, we aren't > > > going to reclaim the entire cache before we bail out. We only scan in > > > 'batch_size' chunks, which generally is 1024. In the worst case that we have > > > one in use object on every slab page we own then yes we're fucked, but we're > > > still fucked with the current code, only with the current code it'll take us 20 > > > minutes of looping in the vm vs. seconds scanning the whole list twice. > > > > Right - this is where growth/allocation rate based aging scans > > come into play, rather than waiting for the VM to hit some unknown > > ceiling and do an unpredictable amount of scanning. > > http://www.spinics.net/lists/linux-mm/msg129470.html > > I suggested static scanning increasement(1/12 + 2/12 + 3/12...) which is > more aggressive compared to as-is. With this, in a reclaim cycle(priority > 12..0), we guarantees that scanning of entire objects list four times > while LRU is two times. Although I believe we don't need four times > (i.e., it's enough with two times), it's just compromise solution with > Josef's much too agressive slab reclaim. > It would be more predictable and aggressive from VM point of view. Yes, I read and understood that post, but you are talking about changing reclaim behaviour when there is low memory, not dealing with the aging problem. It's a brute force big hammer approach, yet we already know that increasingly aggressive reclaim of caches is a problem for filesystems in that it drives us into OOM conditions faster. i.e. agressive shrinker reclaim trashes the working set of cached filesystem metadata and so increases the GFP_NOFS memory allocation demand required by the filesystem at times of critically low memory.... > If some of shrinker cannot be happy with this policy, it would accelerate > the scanning for only that shrinker under shrink_slab call although I don't > like it because it's out of control from VM pov so I'm okay your per-shrinker > aging callback regardless of shrink_slab. My point is if some of shrinker is > painful to be reclaimed, it should have own model to solve it rather than > making general slab reclaim strately very aggressive. We already do this in various filesystems. The issue is that we don't know that we should reclaim caches until shrinker callbacks start happening. i.e. there's *no feedback mechanism* that allows us to age shrinker controlled caches over time. memory reclaim is a feedback loop but if we never hit low memory, then it's never invoked until we actually run out of memory and so it drowns in aging rather than reclaim work when it does get run. Stop looking at the code and start thinking about the architecture - how the subsystems connect and what control/feedback mechanisms are required to allow them to work correctly together. We solve balance and breakdown problems by identifying the missing/sub-optimal feedback loops and fixing them. In this case, what we are missing is the mechanism to detect and control "cache growth in single use workloads when there is no memory pressure". Sustained cache allocation should trigger some amount of aging regardless of how much free memory we have, otherwise we simply fill up memory with objects we're never going to use again..... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>