On Mon, Nov 30, 2015 at 10:58:38AM -0500, Johannes Weiner wrote: > On Mon, Nov 30, 2015 at 02:36:28PM +0300, Vladimir Davydov wrote: > > Suppose we have the following cgroup configuration. > > > > A __ B > > \_ C > > > > A is empty (which is natural for the unified hierarchy AFAIU). B has > > some workload running in it, and C generates socket pressure. Due to the > > socket pressure coming from C we start reclaim in A, which results in > > thrashing of B, but we might not put sockets under pressure in A or C, > > because vmpressure does not account pages scanned/reclaimed in B when > > generating a vmpressure event for A or C. This might result in > > aggressive reclaim and thrashing in B w/o generating a signal for C to > > stop growing socket buffers. > > > > Do you think such a situation is possible? If so, would it make sense to > > switch to post-order walk in shrink_zone and pass sub-tree > > scanned/reclaimed stats to vmpressure for each scanned memcg? > > In that case the LRU pages in C would experience pressure as well, > which would then reign in the sockets in C. There must be some LRU > pages in there, otherwise who is creating socket pressure? > > The same applies to shrinkers. All secondary reclaim is driven by LRU > reclaim results. > > I can see that there is some unfairness in distributing memcg reclaim > pressure purely based on LRU size, because there are scenarios where > the auxiliary objects (incl. sockets, but mostly shrinker pools) > amount to a significant portion of the group's memory footprint. But > substitute group for NUMA node and we've had this behavior for > years. I'm not sure it's actually a problem in practice. > Fiar enough. Let's wait until we hit this problem in real world then. The patch looks good to me. Reviewed-by: Vladimir Davydov <vdavydov@xxxxxxxxxxxxx> Thanks, Vladimir -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html