On Tue, Apr 09, 2013 at 02:13:13PM +0200, Michal Hocko wrote: > Memcg soft reclaim has been traditionally triggered from the global > reclaim paths before calling shrink_zone. mem_cgroup_soft_limit_reclaim > then picked up a group which exceeds the soft limit the most and > reclaimed it with 0 priority to reclaim at least SWAP_CLUSTER_MAX pages. > > The infrastructure requires per-node-zone trees which hold over-limit > groups and keep them up-to-date (via memcg_check_events) which is not > cost free. Although this overhead hasn't turned out to be a bottle neck > the implementation is suboptimal because mem_cgroup_update_tree has no > idea which zones consumed memory over the limit so we could easily end > up having a group on a node-zone tree having only few pages from that > node-zone. > > This patch doesn't try to fix node-zone trees management because it > seems that integrating soft reclaim into zone shrinking sounds much > easier and more appropriate for several reasons. > First of all 0 priority reclaim was a crude hack which might lead to > big stalls if the group's LRUs are big and hard to reclaim (e.g. a lot > of dirty/writeback pages). > Soft reclaim should be applicable also to the targeted reclaim which is > awkward right now without additional hacks. > Last but not least the whole infrastructure eats a lot of code[1]. > > After this patch shrink_zone is done in 2. First it tries to do the > soft reclaim if appropriate (only for global reclaim for now to keep > compatible with the current state) and fall back to ignoring soft limit > if no group is eligible to soft reclaim or nothing has been scanned > during the first pass. Only groups which are over their soft limit or > any of their parent up the hierarchy is over the limit are considered > eligible during the first pass. > > TODO: remove mem_cgroup_tree_per_zone, mem_cgroup_shrink_node_zone and co. > but maybe it would be easier for review to remove that code in a separate > patch... It should be in this series, though, for the diffstat :-) > --- > [1] TODO: put size vmlinux before/after whole clean-up Yes! > @@ -1984,6 +2003,27 @@ static void shrink_zone(struct zone *zone, struct scan_control *sc) > } while (memcg); > } while (should_continue_reclaim(zone, sc->nr_reclaimed - nr_reclaimed, > sc->nr_scanned - nr_scanned, sc)); > + > + return nr_shrunk; > +} > + > + > +static void shrink_zone(struct zone *zone, struct scan_control *sc) > +{ > + bool do_soft_reclaim = mem_cgroup_should_soft_reclaim(sc); > + unsigned long nr_scanned = sc->nr_scanned; > + unsigned nr_shrunk; > + > + nr_shrunk = __shrink_zone(zone, sc, do_soft_reclaim); > + > + /* > + * No group is over the soft limit or those that are do not have > + * pages in the zone we are reclaiming so we have to reclaim everybody > + */ > + if (do_soft_reclaim && (!nr_shrunk || sc->nr_scanned == nr_scanned)) { If no pages were scanned you are doing a second pass regardless of nr_shrunk. If pages were scanned, nr_shrunk must have been increased as well. So I think you can remove all the nr_shrunk counting and just check for scanned pages, no? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>