On Tue, Jun 13, 2017 at 08:01:57AM -0400, Josef Bacik wrote: > On Tue, Jun 13, 2017 at 02:28:02PM +0900, Minchan Kim wrote: > > Hello, > > > > On Thu, Jun 08, 2017 at 03:19:05PM -0400, josef@xxxxxxxxxxxxxx wrote: > > > From: Josef Bacik <jbacik@xxxxxx> > > > > > > When testing a slab heavy workload I noticed that we often would barely > > > reclaim anything at all from slab when kswapd started doing reclaim. > > > This is because we use the ratio of nr_scanned / nr_lru to determine how > > > much of slab we should reclaim. But in a slab only/mostly workload we > > > will not have much page cache to reclaim, and thus our ratio will be > > > really low and not at all related to where the memory on the system is. > > > > I want to understand this clearly. > > Why nr_scanned / nr_lru is low if system doesnt' have much page cache? > > Could you elaborate it a bit? > > > > Yeah so for example on my freshly booted test box I have this > > Active: 58840 kB > Inactive: 46860 kB > > Every time we do a get_scan_count() we do this > > scan = size >> sc->priority > > where sc->priority starts at DEF_PRIORITY, which is 12. The first loop through > reclaim would result in a scan target of 2 pages to 11715 total inactive pages, > and 3 pages to 14710 total active pages. This is a really really small target > for a system that is entirely slab pages. And this is super optimistic, this > assumes we even get to scan these pages. We don't increment sc->nr_scanned > unless we 1) isolate the page, which assumes it's not in use, and 2) can lock > the page. Under pressure these numbers could probably go down, I'm sure there's > some random pages from daemons that aren't actually in use, so the targets get > even smaller. > > We have to get sc->priority down a lot before we start to get to the 1:1 ratio > that would even start to be useful for reclaim in this scenario. Add to this > that most shrinkable slabs have this idea that their objects have to loop > through the LRU twice (no longer icache/dcache as Al took my patch to fix that > thankfully) and you end up spending a lot of time looping and reclaiming > nothing. Basing it on actual slab usage makes more sense logically and avoids > this kind of problem. Thanks, Thanks. I got understood now. As I see your change, it seems to be rather aggressive to me. node_slab = lruvec_page_state(lruvec, NR_SLAB_RECLAIMABLE); shrink_slab(,,, node_slab >> sc->priority, node_slab); The point is when we finish reclaiming from direct/background(ie, kswapd), it makes sure that VM scanned slab object up to twice of the size which is consistent with LRU pages. What do you think about this? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>