On Fri 12-09-14 10:58:33, Tejun Heo wrote: > (cc'ing memcg maintainers and quoting whole body) > > On Thu, Sep 11, 2014 at 02:05:19PM +1200, Tyler Power wrote: > > Hi there, > > > > Hopefully I'm sending this to the right place, this is the first time > > I've reported a kernel bug. I'm roughly following this format here > > https://www.kernel.org/pub/linux/docs/lkml/reporting-bugs.html. > > > > 1. The OOM killer kicks in to kill processes inside a cgroup that has > > hit its memory limit but sometimes kills a process outside of the > > cgroup > > > > 2. We've encountered an error on Ubuntu 12.04 running on vsphere with > > kernel linux-image-3.13.0-32-generic as well as > > linux-image-3.13.0-35-generic which causes the machine to hard lock > > up. It is completely unresponsive until hard reset. I am not familiar with Ubuntu kernels much but are those kernels applying any patches on top of 3.13? If yes can you reproduce with the issue with the Vanilla kernel? It would be also good to know whether the same issue is reproducible with the current Linus' tree. [ 2634.867954] Task in /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 killed as a result of limit of /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 [ 2634.988982] Task in /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 killed as a result of limit of /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 [ 2635.101917] Task in /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 killed as a result of limit of /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 [ 2635.212105] Task in / killed as a result of limit of /lxc/e177098cd5f95ff8dbfa1ea14667b0bdc525dfa2e1c1b3bf763acd0a7ef217a4 So this is about the same memcg all the time (except for the last one which is obviously invalid). The oom reports are suspicious though: [ 2634.922570] Memory cgroup out of memory: Kill process 15919 (java) score 904 or sacrifice child [ 2634.924952] Killed process 15758 (bash) total-vm:11040kB, anon-rss:216kB, file-rss:416kB [ 2635.041469] Memory cgroup out of memory: Kill process 15919 (java) score 904 or sacrifice child [ 2635.043872] Killed process 15757 (bash) total-vm:11040kB, anon-rss:216kB, file-rss:392kB [ 2635.150580] Memory cgroup out of memory: Kill process 15919 (java) score 906 or sacrifice child [ 2635.153010] Killed process 15919 (java) total-vm:2205588kB, anon-rss:58444kB, file-rss:564kB [ 2635.249819] Memory cgroup out of memory: Kill process 15861 (java) score 918 or sacrifice child So we are always selecting 15919 but actually killing bash instead. At least two times. The third time it is java that is killed and then things go south. 15919 is not listed as a memcg member: [ 2634.888650] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 2634.891018] [15552] 0 15552 12511 732 29 0 0 sshd [ 2634.893373] [15596] 0 15596 5180 276 15 0 0 cron [ 2634.895686] [15731] 0 15731 19971 926 43 0 0 sshd [ 2634.898004] [15735] 1014 15735 19971 395 40 0 0 sshd [ 2634.900305] [15736] 1014 15736 2760 376 10 0 0 bash [ 2634.902588] [15756] 1014 15756 551397 14730 92 0 0 java [ 2634.904853] [15757] 1014 15757 2760 152 10 0 0 bash [ 2634.907080] [15758] 1014 15758 2760 158 10 0 0 bash [ 2634.909316] [15759] 1014 15759 1472 171 7 0 0 tee [ 2634.911495] [15760] 1014 15760 1472 172 8 0 0 tee [ 2634.913689] [15936] 0 15936 11535 338 28 0 0 cron [ 2634.915905] [15937] 0 15937 1102 153 8 0 0 sh [ 2634.918055] [15938] 0 15938 1102 153 8 0 0 maxlifetime [ 2634.920385] [15940] 0 15940 53661 2029 105 0 0 php5 mem_cgroup_out_of_memory relies on css_task_iter to iterate through all tasks (threads) belonging to a memcg. Memcg just makes sure that memcgs under the target one are considered. So it might be possible that a !thread_group_leader has been chosen. dump_tasks would then ignore it. This alone wouldn't be a big deal. How we could end up killing bash as a child doesn't make any sense to me. First children are killed only if they have a bigger score and second bash as a child of Java? 3.13 kernel didn't have 1da4db0cd5c8a which is mentioning endless loops. As the lockup was detected and we do not see "Killed process XYZ" it might be possible that we are still in do {} while_each_thread() loop. This is called with preemption disabled so lockup detector would be quite natural if the loop cannot finish. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html