On Tue 12-07-16 08:35:06, Shayan Pooya wrote: > >> With strace, when running 500 concurrent mem-hog tasks on the same > >> kernel, 33 of them failed with: > >> > >> strace: ../sysdeps/nptl/fork.c:136: __libc_fork: Assertion > >> `THREAD_GETMEM (self, tid) != ppid' failed. > >> > >> Which is: https://sourceware.org/bugzilla/show_bug.cgi?id=15392 > >> And discussed before at: https://lkml.org/lkml/2015/2/6/470 but that > >> patch was not accepted. > > > > OK, so the problem is that the oom killed task doesn't report the futex > > release properly? If yes then I fail to see how that is memcg specific. > > Could you try to clarify what you consider a bug again, please? I am not > > really sure I understand this report. > > It looks like it is just a very easy way to reproduce the problem that > Konstantin described in that lkml thread. That patch was not accepted > and I see no other fixes for that issue upstream. Here is a copy of > his root-cause analysis from said thread: > > Whole sequence looks like: task calls fork, glibc calls syscall clone with > CLONE_CHILD_SETTID and passes pointer to TLS THREAD_SELF->tid as argument. > Child task gets read-only copy of VM including TLS. Child calls put_user() > to handle CLONE_CHILD_SETTID from schedule_tail(). put_user() trigger page > fault and it fails because do_wp_page() hits memcg limit without invoking > OOM-killer because this is page-fault from kernel-space. Put_user returns > -EFAULT, which is ignored. Child returns into user-space and catches here > assert (THREAD_GETMEM (self, tid) != ppid), glibc tries to print something > but hangs on deadlock on internal locks. Halt and catch fire. OK, I see! Thanks for the clarification. So the bug is that put_user return value is ignored. Let's see whether Konstantin's patch will be accepted or Oleg comes with something else. -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html