Re: [RFC][PATCH 0/4] memcg: per cgroup background reclaim

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Ying Han <yinghan@xxxxxxxxxx> [2010-11-29 23:03:31]:

> On Mon, Nov 29, 2010 at 10:54 PM, KOSAKI Motohiro
> <kosaki.motohiro@xxxxxxxxxxxxxx> wrote:
> >> The current implementation of memcg only supports direct reclaim and this
> >> patchset adds the support for background reclaim. Per cgroup background
> >> reclaim is needed which spreads out the memory pressure over longer period
> >> of time and smoothes out the system performance.
> >>
> >> The current implementation is not a stable version, and it crashes sometimes
> >> on my NUMA machine. Before going further for debugging, I would like to start
> >> the discussion and hear the feedbacks of the initial design.
> >
> > I haven't read your code at all. However I agree your claim that memcg
> > also need background reclaim.
> 
> Thanks for your comment.
> >
> > So if you post high level design memo, I'm happy.
> 
> My high level design is kind of spreading out into each patch, and
> here is the consolidated one. This is nothing more but cluing all the
> commits' messages for the following patches.
> 
> "
> The current implementation of memcg only supports direct reclaim and this
> patchset adds the support for background reclaim. Per cgroup background
> reclaim is needed which spreads out the memory pressure over longer period
> of time and smoothes out the system performance.
> 
> There is a kswapd kernel thread for each memory node. We add a different kswapd
> for each cgroup. The kswapd is sleeping in the wait queue headed at kswapd_wait
> field of a kswapd descriptor. The kswapd descriptor stores information of node
> or cgroup and it allows the global and per cgroup background reclaim to share
> common reclaim algorithms. The per cgroup kswapd is invoked at mem_cgroup_charge
> when the cgroup's memory usage above a threshold--low_wmark. Then the kswapd
> thread starts to reclaim pages in a priority loop similar to global algorithm.
> The kswapd is done if the usage below a threshold--high_wmark.
>

So the logic is per-node/per-zone/per-cgroup right?
 
> The per cgroup background reclaim is based on the per cgroup LRU and also adds
> per cgroup watermarks. There are two watermarks including "low_wmark" and
> "high_wmark", and they are calculated based on the limit_in_bytes(hard_limit)
> for each cgroup. Each time the hard_limit is change, the corresponding wmarks
> are re-calculated. Since memory controller charges only user pages, there is

What about memsw limits, do they impact anything, I presume not.

> no need for a "min_wmark". The current calculation of wmarks is a function of
> "memory.min_free_kbytes" which could be adjusted by writing different values
> into the new api. This is added mainly for debugging purpose.

When you say debugging, can you elaborate?

> 
> The kswapd() function now is shared between global and per cgroup kswapd thread.
> It is passed in with the kswapd descriptor which contains the information of
> either node or cgroup. Then the new function balance_mem_cgroup_pgdat is invoked
> if it is per cgroup kswapd thread. The balance_mem_cgroup_pgdat performs a
> priority loop similar to global reclaim. In each iteration it invokes
> balance_pgdat_node for all nodes on the system, which is a new function performs
> background reclaim per node. After reclaiming each node, it checks
> mem_cgroup_watermark_ok() and breaks the priority loop if returns true. A per
> memcg zone will be marked as "unreclaimable" if the scanning rate is much
> greater than the reclaiming rate on the per cgroup LRU. The bit is cleared when
> there is a page charged to the cgroup being freed. Kswapd breaks the priority
> loop if all the zones are marked as "unreclaimable".
> "
> 
> Also, I am happy to add more descriptions if anything not clear :)
>

Thanks for explaining this in detail, it makes the review easier. 

-- 
	Three Cheers,
	Balbir

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom policy in Canada: sign http://dissolvethecrtc.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]