On Mon, Dec 02, 2013 at 03:19:41PM +0400, Vladimir Davydov wrote: > Currently in addition to a shrink_control struct shrink_slab() takes two > arguments, nr_pages_scanned and lru_pages, which are used for balancing > slab reclaim versus page reclaim - roughly speaking, shrink_slab() will > try to scan nr_pages_scanned/lru_pages fraction of all slab objects. Yes, that is it's primary purpose, and the variables explain that clearly. i.e. it passes a quantity of work to be done, and a value to relate that to the overall size of the cache that the work was one on. i.e. they tell us that shrink_slab is trying to stay in balance with the amount of work that page cache reclaim has just done. > However, shrink_slab() is not always called after page cache reclaim. > For example, drop_slab() uses shrink_slab() to drop as many slab objects > as possible and thus has to pass phony values 1000/1000 to it, which do > not make sense for nr_pages_scanned/lru_pages. Moreover, as soon as Yup, but that's not the primary purpose of the code, and doesn't require balancing against page cache reclaim. hence the numbers that are passed in are just there to make the shrinkers iterate efficiently but without doing too much work in a single scan. i.e. reclaim in chunks across all caches, rather than try to completely remove a single cache at a time.... And the reason that this is done? because caches have reclaim relationships that mean some shrinkers can't make progress until other shrinkers do their work. Hence to effective free all memory, we have to iterate repeatedly across all caches. That's what drop_slab() does. o > kmemcg reclaim is introduced, we will have to make up phony values for > nr_pages_scanned and lru_pages again when doing kmem-only reclaim for a > memory cgroup, which is possible if the cgroup has its kmem limit less > than the total memory limit. I'm missing something here - why would memcg reclaim require passing phony values? How are you going to keep slab caches in balance with memory pressure generated by the page cache? And, FWIW: > unsigned long shrink_slab(struct shrink_control *shrink, > - unsigned long nr_pages_scanned, > - unsigned long lru_pages); > + unsigned long fraction, unsigned long denominator); I'm not sure what "fraction" means in this case. A fraction is made up of a numerator and denominator, but: > @@ -243,9 +243,9 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, > nr = atomic_long_xchg(&shrinker->nr_deferred[nid], 0); > > total_scan = nr; > - delta = (4 * nr_pages_scanned) / shrinker->seeks; > + delta = (4 * fraction) / shrinker->seeks; (4 * nr_pages_scanned) is a dividend, while: > delta *= max_pass; > - do_div(delta, lru_pages + 1); > + do_div(delta, denominator + 1); (lru_pages + 1) is a divisor. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>