On Fri 07-06-13 17:13:55, Piotr Nowojski wrote: > W dniu 06.06.2013 17:57, Michal Hocko pisze: > >>>In our system we have hit some very annoying situation (bug?) with > >>>cgroups. I'm writing to you, because I have found your posts on > >>>mailing lists with similar topic. Maybe you could help us or point > >>>some direction where to look for/ask. > >>> > >>>We have system with ~15GB RAM (+2GB SWAP), and we are running ~10 > >>>heavy IO processes. Each process is using constantly 200-210MB RAM > >>>(RSS) and a lot of page cache. All processes are in cgroup with > >>>following limits: > >>> > >>>/sys/fs/cgroup/taskell2 $ cat memory.limit_in_bytes > >>>memory.memsw.limit_in_bytes > >>>14183038976 > >>>15601344512 > >I assume that memory.use_hierarchy is 1, right? > System has been rebooted since last test, so I can not guarantee > that it was set for 100%, but it should have been. Currently I'm > rerunning this scenario that lead to the described problem with: > > /sys/fs/cgroup/taskell2# cat memory.use_hierarchy ../memory.use_hierarchy > 1 > 0 OK, good. Your numbers suggeste that the hierachy _is_ in use. I just wanted to be 100% sure. [...] > >The core thing to find out is why the hard limit reclaim is not able to > >free anything. Unfortunatelly we do not have memcg reclaim statistics so > >it would be a bit harder. I would start with the above patch first and > >then I can prepare some debugging patches for you. > I will try 3.6 (probably 3.7) kernel after weekend - unfortunately I would simply try 3.9 (stable) and skip those two. > repeating whole scenario is taking 10-30 hours because of very > slowly growing page cache. OK, this is good to know. > >Also does 3.4 vanila (or the stable kernel) behave the same way? Is the > >current vanilla behaving the same way? > I don't know, we are using standard kernel that comes from Ubuntu. yes, but I guess ubuntu, like any other distro puts some pathces on top of vanilla kernel. > >Finally, have you seen the issue for a longer time or it started showing > >up only now? > > > This system is very new. We have started testing scenario which > triggered OOM something like one week ago and we have immediately > hit this issue. Previously, with different scenarios and different > memory usage by processes we didn't have this issue. OK -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>