On Fri, Oct 24, 2014 at 02:37:34PM +0400, Vladimir Davydov wrote: > diff --git a/mm/vmscan.c b/mm/vmscan.c > index a384339bf718..2cf6b04a4e0c 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -339,6 +339,26 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker, > return freed; > } > > +static unsigned long > +run_shrinker(struct shrink_control *shrinkctl, struct shrinker *shrinker, > + unsigned long nr_pages_scanned, unsigned long lru_pages) > +{ > + unsigned long freed = 0; > + > + if (!(shrinker->flags & SHRINKER_NUMA_AWARE)) { > + shrinkctl->nid = 0; > + return shrink_slab_node(shrinkctl, shrinker, > + nr_pages_scanned, lru_pages); > + } > + > + for_each_node_mask(shrinkctl->nid, shrinkctl->nodes_to_scan) { > + if (node_online(shrinkctl->nid)) > + freed += shrink_slab_node(shrinkctl, shrinker, > + nr_pages_scanned, lru_pages); > + } > + return freed; > +} The slab shrinking logic accumulates the lru pages, as well as the nodes_to_scan mask, when going over the zones, only to go over the zones here again using the accumulated node information. Why not just invoke the thing per-zone instead in the first place? Kswapd already does that (although it could probably work with the per-zone lru_pages and nr_scanned deltas) and direct reclaim should as well. It would simplify the existing code as well as your series a lot. > + /* > + * For memcg-aware shrinkers iterate over the target memcg > + * hierarchy and run the shrinker on each kmem-active memcg > + * found in the hierarchy. > + */ > + shrinkctl->memcg = shrinkctl->target_mem_cgroup; > + do { > + if (!shrinkctl->memcg || > + memcg_kmem_is_active(shrinkctl->memcg)) > + freed += run_shrinker(shrinkctl, shrinker, > nr_pages_scanned, lru_pages); > - > - } > + } while ((shrinkctl->memcg = > + mem_cgroup_iter(shrinkctl->target_mem_cgroup, > + shrinkctl->memcg, NULL)) != NULL); More symptoms of the above. This hierarchy walk is duplicative and potentially quite expensive. > @@ -2381,6 +2414,7 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > gfp_t orig_mask; > struct shrink_control shrink = { > .gfp_mask = sc->gfp_mask, > + .target_mem_cgroup = sc->target_mem_cgroup, > }; > enum zone_type requested_highidx = gfp_zone(sc->gfp_mask); > bool reclaimable = false; > @@ -2400,18 +2434,22 @@ static bool shrink_zones(struct zonelist *zonelist, struct scan_control *sc) > gfp_zone(sc->gfp_mask), sc->nodemask) { > if (!populated_zone(zone)) > continue; > + > + if (global_reclaim(sc) && > + !cpuset_zone_allowed(zone, GFP_KERNEL | __GFP_HARDWALL)) > + continue; > + > + lru_pages += global_reclaim(sc) ? > + zone_reclaimable_pages(zone) : > + mem_cgroup_zone_reclaimable_pages(zone, > + sc->target_mem_cgroup); > + node_set(zone_to_nid(zone), shrink.nodes_to_scan); And yet another costly hierarchy walk. The reclaim code walks zonelists according to a nodemask, and within each zone it walks lruvecs according to the memcg hierarchy. The shrinkers are wrong in making up an ad-hoc concept of NUMA nodes that otherwise does not exist anywhere in the VM. Please integrate them properly instead of adding more duplication on top. Thanks -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>