Re: [PATCH V6 00/10] memcg: per cgroup background reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 21, 2011 at 01:00:16PM +0900, KAMEZAWA Hiroyuki wrote:
> On Thu, 21 Apr 2011 04:51:07 +0200
> Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> 
> > > If the cgroup is configured to use per cgroup background reclaim, a kswapd
> > > thread is created which only scans the per-memcg LRU list.
> > 
> > We already have direct reclaim, direct reclaim on behalf of a memcg,
> > and global kswapd-reclaim.  Please don't add yet another reclaim path
> > that does its own thing and interacts unpredictably with the rest of
> > them.
> > 
> > As discussed on LSF, we want to get rid of the global LRU.  So the
> > goal is to have each reclaim entry end up at the same core part of
> > reclaim that round-robin scans a subset of zones from a subset of
> > memory control groups.
> 
> It's not related to this set. And I think even if we remove global LRU,
> global-kswapd and memcg-kswapd need to do independent work.
> 
> global-kswapd : works for zone/node balancing and making free pages,
>                 and compaction. select a memcg vicitm and ask it
>                 to reduce memory with regard to gfp_mask. Starts its work
>                 when zone/node is unbalanced.

For soft limit reclaim (which is triggered by global memory pressure),
we want to scan a group of memory cgroups equally in round robin
fashion.  I think at LSF we established that it is not fair to find
the one that exceeds its limit the most and hammer it until memory
pressure is resolved or there is another group with more excess.

So even for global kswapd, sooner or later we need a mechanism to
apply equal pressure to a set of memcgs.

With the removal of the global LRU, we ALWAYS operate on a set of
memcgs in a round-robin fashion, not just for soft limit reclaim.

So yes, these are two different things, but they have the same
requirements.

> memcg-kswapd  : works for reducing usage of memory, no interests on
>                 zone/nodes. Starts when high/low watermaks hits.

When the watermark is hit in the charge path, we want to wake up the
daemon to reclaim from a specific memcg.

When multiple memcgs exceed their watermarks in parallel (after all,
we DO allow concurrency), we again have a group of memcgs we want to
reclaim from in a fair fashion until their watermarks are met again.

And memcg reclaim is not oblivious to nodes and zones, right now, we
also do mind the current node and respect the zone balancing when we
do direct reclaim on behalf of a memcg.

So, to be honest, I really don't see how both cases should be
independent from each other.  On the contrary, I see very little
difference between them.  The entry path differs slightly as well as
the predicate for the set of memcgs to scan.  But most of the worker
code is exactly the same, no?

> > > Two watermarks ("high_wmark", "low_wmark") are added to trigger the
> > > background reclaim and stop it. The watermarks are calculated based
> > > on the cgroup's limit_in_bytes.
> > 
> > Which brings me to the next issue: making the watermarks configurable.
> > 
> > You argued that having them adjustable from userspace is required for
> > overcommitting the hardlimits and per-memcg kswapd reclaim not kicking
> > in in case of global memory pressure.  But that is only a problem
> > because global kswapd reclaim is (apart from soft limit reclaim)
> > unaware of memory control groups.
> > 
> > I think the much better solution is to make global kswapd memcg aware
> > (with the above mentioned round-robin reclaim scheduler), compared to
> > adding new (and final!) kernel ABI to avoid an internal shortcoming.
> 
> I don't think its a good idea to kick kswapd even when free memory is enough.

This depends on what kswapd is supposed to be doing.  I don't say we
should reclaim from all memcgs (i.e. globally) just because one memcg
hits its watermark, of course.

But the argument was that we need the watermarks configurable to force
per-memcg reclaim even when the hard limits are overcommitted, because
global reclaim does not do a fair job to balance memcgs.  My counter
proposal is to fix global reclaim instead and apply equal pressure on
memcgs, such that we never have to tweak per-memcg watermarks to
achieve the same thing.

	Hannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]