On Tue, Nov 24, 2015 at 04:59:40PM -0500, Johannes Weiner wrote: ... > @@ -2396,6 +2396,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc, > memcg = mem_cgroup_iter(root, NULL, &reclaim); > do { > unsigned long lru_pages; > + unsigned long reclaimed; > unsigned long scanned; > struct lruvec *lruvec; > int swappiness; > @@ -2408,6 +2409,7 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc, > > lruvec = mem_cgroup_zone_lruvec(zone, memcg); > swappiness = mem_cgroup_swappiness(memcg); > + reclaimed = sc->nr_reclaimed; > scanned = sc->nr_scanned; > > shrink_lruvec(lruvec, swappiness, sc, &lru_pages); > @@ -2418,6 +2420,11 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc, > memcg, sc->nr_scanned - scanned, > lru_pages); > > + /* Record the group's reclaim efficiency */ > + vmpressure(sc->gfp_mask, memcg, false, > + sc->nr_scanned - scanned, > + sc->nr_reclaimed - reclaimed); > + Suppose we have the following cgroup configuration. A __ B \_ C A is empty (which is natural for the unified hierarchy AFAIU). B has some workload running in it, and C generates socket pressure. Due to the socket pressure coming from C we start reclaim in A, which results in thrashing of B, but we might not put sockets under pressure in A or C, because vmpressure does not account pages scanned/reclaimed in B when generating a vmpressure event for A or C. This might result in aggressive reclaim and thrashing in B w/o generating a signal for C to stop growing socket buffers. Do you think such a situation is possible? If so, would it make sense to switch to post-order walk in shrink_zone and pass sub-tree scanned/reclaimed stats to vmpressure for each scanned memcg? Thanks, Vladimir > /* > * Direct reclaim and kswapd have to scan all memory > * cgroups to fulfill the overall scan target for the > @@ -2449,7 +2456,8 @@ static bool shrink_zone(struct zone *zone, struct scan_control *sc, > reclaim_state->reclaimed_slab = 0; > } > > - vmpressure(sc->gfp_mask, sc->target_mem_cgroup, > + /* Record the subtree's reclaim efficiency */ > + vmpressure(sc->gfp_mask, sc->target_mem_cgroup, true, > sc->nr_scanned - nr_scanned, > sc->nr_reclaimed - nr_reclaimed); > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>