Tejun Heo <tj@xxxxxxxxxx> writes: > Hello, > > On Mon, Jul 29, 2013 at 11:51:09AM +0200, Michal Hocko wrote: >> Isn't this a bug in freezer then? I am not familiar with the freezer >> much but memcg oom handling seems correct to me. The task is sleeping >> KILLABLE and fatal_signal_pending in mem_cgroup_handle_oom will tell us >> to bypass the charge and let the taks go away. > > Is the problem a frozen task not being killed even when SIGKILL is > received? If so, it is a known problem and a side-effect of > cgroup_freezer (ab)using and making the existing power management > freezer visible to userland without really thinking about the > implications. :( Something like that. I need to look at it in a little more detail. The idiom someone adopted to atomically kill all of the tasks in a cgroup is to. Freeze all of the tasks. Send them SIGKILL. unfreeze all of the tasks. The freezing actually fails in this case so I don't know what is happening. So this is not a simple matter of a frozen task not dying when SIGKILL is received. For the most part not dying when SIGKILL is received seems like correct behavior for a frozne task. Certainly it is correct behavior for any other signal. The issue is that the tasks don't freeze or that when thawed the SIGKILL is still ignored. It seems a wake up is being missed in there somewhere. > So, yeah, if you use cgroup_freezer now, the tasks will get stuck in > states which aren't well defined when visible from userland and will > just stay there until unfrozen no matter what. Yet another reason > I'll be screaming like a banshee at anyone who says that cgroup is > built to delegate subtree access rights to !root users. Yes. From the looks of the looks of it the cgroup implementation is rather badly borked right now, and definitely not up to the standards of the other core pieces of the kernel. One of the reasons I was rather apalled when systemd started using them. Until the code actually works reliably and the races are removed most people's systems would be much better off with cgroups compiled out. A single unified hierarchy is a really nasty idea for the same set of reasons. You have to recompile to disable a controller to see if it that controller's bugs are what are causing problems on your production system. Compiles or even just a reboot is a very heavy hammer to ask people to use when they are triaging a problem. That said semantically having more than single process controls for all of user space are very desirable. Until we have code that is safe to use giving it any additional exposure seems like a bad idea. > It's on the to-do list but a very long term one. Right now, if you > combine userland OOM handling with freezer and whatnot, it'd be pretty > easy to get into trouble. Thanks for the heads up. Right now this is looking like a regression but it might just be that my test machine has the right combination of racying pixies to trigger a long standing bug. I am also seeing what looks like a leak somewhere in the cgroup code as well. After some runs of the same reproducer I get into a state where after everything is clean up. All of the control groups have been removed and the cgroup filesystem is unmounted, I can mount a cgroup filesystem with that same combindation of subsystems, but I can't mount a cgroup filesystem with any of those subsystems in any other combination. So I am guessing that the superblock is from the original mounting is still lingering for some reason. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers