Re: [RFC][PATCH] memcg: isolate pages in memcg lru from global lru

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Wed, Mar 30, 2011 at 7:25 PM, KAMEZAWA Hiroyuki <kamezawa.hiroyu@xxxxxxxxxxxxxx> wrote:
On Wed, 30 Mar 2011 17:48:18 -0700
Ying Han <yinghan@xxxxxxxxxx> wrote:

> In memory controller, we do both targeting reclaim and global reclaim. The
> later one walks through the global lru which links all the allocated pages
> on the system. It breaks the memory isolation since pages are evicted
> regardless of their memcg owners. This patch takes pages off global lru
> as long as they are added to per-memcg lru.
>
> Memcg and cgroup together provide the solution of memory isolation where
> multiple cgroups run in parallel without interfering with each other. In
> vm, memory isolation requires changes in both page allocation and page
> reclaim. The current memcg provides good user page accounting, but need
> more work on the page reclaim.
>
> In an over-committed machine w/ 32G ram, here is the configuration:
>
> cgroup-A/  -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G
> cgroup-B/  -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G
>
> 1) limit_in_bytes is the hard_limit where process will be throttled or OOM
> killed by going over the limit.
> 2) memory between soft_limit and limit_in_bytes are best-effort. soft_limit
> provides "guarantee" in some sense.
>
> Then, it is easy to generate the following senario where:
>
> cgroup-A/  -- usage_in_bytes = 20G
> cgroup-B/  -- usage_in_bytes = 12G
>
> The global memory pressure triggers while cgroup-A keep allocating memory. At
> this point, pages belongs to cgroup-B can be evicted from global LRU.
>
> We do have per-memcg targeting reclaim including per-memcg background reclaim
> and soft_limit reclaim. Both of them need some improvement, and regardless we
> still need this patch since it breaks isolation.
>
> Besides, here is to-do list I have on memcg page reclaim and they are sorted.
> a) per-memcg background reclaim. to reclaim pages proactively
agree,

> b) skipping global lru reclaim if soft_limit reclaim does enough work. this is
> both for global background reclaim and global ttfp reclaim.

agree. but zone-balancing cannot be avoidalble for now. So, I think we need a
inter-zone-page-migration to balancing memory between zones...if necessary.

thank you for your comments, and can you clarify a bit on this? Actually I was thinking about the zone balancing within memcg, but haven't thought it through yet. I would like to learn more on the cases that we can not avoid global zone-balancing totally.


> c) improve the soft_limit reclaim to be efficient.

must be done.

The current design of soft_limit is more on the correctness rather than efficiency. If we are talking about to improve the efficiency of target reclaim, there are quite a lot to change. The first thing might be improving the per-zone RB tree. They are currently based on per-memcg (usage_limit-soft_limit) regardless of how much pages landed on the zone.
 

> d) isolate pages in memcg from global list since it breaks memory isolation.

 
>

I never agree this until about a),b),c) is fixed and we can go nowhere.

BTW, in other POV, for reducing size of page_cgroup, we must remove ->lru
on page_cgroup. If divide-and-conquer memory reclaim works enough,
we can do that. But this is a big global VM change, so we need enough
justification.

I can agree on that. The change looks big, especially without efficient target reclaim. However
I do believe we need this to have isolation guarantee. 



> I have some basic test on this patch and more tests definitely are needed:
>

> Functional:
> two memcgs under root. cgroup-A is reading 20g file with 2g limit,
> cgroup-B is running random stuff with 500m limit. Check the counters for
> per-memcg lru and global lru, and they should add-up.
>
> 1) total file pages
> $ cat /proc/meminfo | grep Cache
> Cached:          6032128 kB
>
> 2) file lru on global lru
> $ cat /proc/vmstat | grep file
> nr_inactive_file 0
> nr_active_file 963131
>
> 3) file lru on root cgroup
> $ cat /dev/cgroup/memory.stat | grep file
> inactive_file 0
> active_file 0
>
> 4) file lru on cgroup-A
> $ cat /dev/cgroup/A/memory.stat | grep file
> inactive_file 2145759232
> active_file 0
>
> 5) file lru on cgroup-B
> $ cat /dev/cgroup/B/memory.stat | grep file
> inactive_file 401408
> active_file 143360
>
> Performance:
> run page fault test(pft) with 16 thread on faulting in 15G anon pages
> in 16G cgroup. There is no regression noticed on "flt/cpu/s"
>

You need a fix for /proc/meminfo, /proc/vmstat to count memcg's ;)

Yes. :) Since this is RFC prototype, i took the shortcut by reusing the existing stat by only count the pages on global LRU. 

Anyway, this seems too aggresive to me, for now. Please do a), b), c), at first.
 

IIUC, this patch itself can cause a livelock when softlimit is misconfigured.
What is the protection against wrong softlimit ?

Hmm, can you help to clarify on that?
 
 
If we do this kind of LRU isolation, we'll need some limitation of the sum of
limits of all memcg for avoiding wrong configuration. That may change UI, dramatically.
(As RT-class cpu limiting cgroup does.....)

This sounds related the question above, so I just wait for my question being answered :)


Anyway, thank you for data.

sure

--Ying
 
Thanks,
-Kame




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]