On Wed, 30 Mar 2011 17:48:18 -0700 Ying Han <yinghan@xxxxxxxxxx> wrote: > In memory controller, we do both targeting reclaim and global reclaim. The > later one walks through the global lru which links all the allocated pages > on the system. It breaks the memory isolation since pages are evicted > regardless of their memcg owners. This patch takes pages off global lru > as long as they are added to per-memcg lru. > > Memcg and cgroup together provide the solution of memory isolation where > multiple cgroups run in parallel without interfering with each other. In > vm, memory isolation requires changes in both page allocation and page > reclaim. The current memcg provides good user page accounting, but need > more work on the page reclaim. > > In an over-committed machine w/ 32G ram, here is the configuration: > > cgroup-A/ -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G > cgroup-B/ -- limit_in_bytes = 20G, soft_limit_in_bytes = 15G > > 1) limit_in_bytes is the hard_limit where process will be throttled or OOM > killed by going over the limit. > 2) memory between soft_limit and limit_in_bytes are best-effort. soft_limit > provides "guarantee" in some sense. > > Then, it is easy to generate the following senario where: > > cgroup-A/ -- usage_in_bytes = 20G > cgroup-B/ -- usage_in_bytes = 12G > > The global memory pressure triggers while cgroup-A keep allocating memory. At > this point, pages belongs to cgroup-B can be evicted from global LRU. > > We do have per-memcg targeting reclaim including per-memcg background reclaim > and soft_limit reclaim. Both of them need some improvement, and regardless we > still need this patch since it breaks isolation. > > Besides, here is to-do list I have on memcg page reclaim and they are sorted. > a) per-memcg background reclaim. to reclaim pages proactively agree, > b) skipping global lru reclaim if soft_limit reclaim does enough work. this is > both for global background reclaim and global ttfp reclaim. agree. but zone-balancing cannot be avoidalble for now. So, I think we need a inter-zone-page-migration to balancing memory between zones...if necessary. > c) improve the soft_limit reclaim to be efficient. must be done. > d) isolate pages in memcg from global list since it breaks memory isolation. > I never agree this until about a),b),c) is fixed and we can go nowhere. BTW, in other POV, for reducing size of page_cgroup, we must remove ->lru on page_cgroup. If divide-and-conquer memory reclaim works enough, we can do that. But this is a big global VM change, so we need enough justification. > I have some basic test on this patch and more tests definitely are needed: > > Functional: > two memcgs under root. cgroup-A is reading 20g file with 2g limit, > cgroup-B is running random stuff with 500m limit. Check the counters for > per-memcg lru and global lru, and they should add-up. > > 1) total file pages > $ cat /proc/meminfo | grep Cache > Cached: 6032128 kB > > 2) file lru on global lru > $ cat /proc/vmstat | grep file > nr_inactive_file 0 > nr_active_file 963131 > > 3) file lru on root cgroup > $ cat /dev/cgroup/memory.stat | grep file > inactive_file 0 > active_file 0 > > 4) file lru on cgroup-A > $ cat /dev/cgroup/A/memory.stat | grep file > inactive_file 2145759232 > active_file 0 > > 5) file lru on cgroup-B > $ cat /dev/cgroup/B/memory.stat | grep file > inactive_file 401408 > active_file 143360 > > Performance: > run page fault test(pft) with 16 thread on faulting in 15G anon pages > in 16G cgroup. There is no regression noticed on "flt/cpu/s" > You need a fix for /proc/meminfo, /proc/vmstat to count memcg's ;) Anyway, this seems too aggresive to me, for now. Please do a), b), c), at first. IIUC, this patch itself can cause a livelock when softlimit is misconfigured. What is the protection against wrong softlimit ? If we do this kind of LRU isolation, we'll need some limitation of the sum of limits of all memcg for avoiding wrong configuration. That may change UI, dramatically. (As RT-class cpu limiting cgroup does.....) Anyway, thank you for data. Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>