Re: [PATCH v8 0/7] freezer for cgroup v2

Roman Gushchin <guro@xxxxxx> · Thu, 21 Feb 2019 22:43:58 +0000

On Thu, Feb 21, 2019 at 05:29:24PM +0100, Oleg Nesterov wrote:
> On 02/20, Roman Gushchin wrote:
> >
> > On Wed, Feb 20, 2019 at 03:37:48PM +0100, Oleg Nesterov wrote:
> > >
> > > I tried to not argue with intent, but to be honest I am more and more
> > > sceptical... Lets forget about ptrace for the moment.
> > >
> > > Once again, why do we want a killable freezer?
> > >
> > > If a user wants to kill a frozen task from CGRP_FROZEN cgroup he can simply
> > >
> > > 	1. send SIGKILL to that task
> > >
> > > 	2. migrate it to the root cgroup.
> > >
> > > why this doesn't / can't work?
> >
> > It does work, but it doesn't look as a nice interface to take into
> > the cgroup v2 world.
> >
> > It just not clear, why killing a frozen task requires some cgroup-level
> > operations? It doesn't add anything except some additional complexity
> > to the userspace.
> 
> Yes.
> 
> But to me this is a reasonable trade-off because this way we do not add
> additional complexity to the kernel.
> 
> Actually, "killable" is not that difficult afaics. "ptraceable" looks more
> problematic to me. Again, user-space can do
> 
> 	1. PTRACE_SEIZE
> 	2. move the tracee to the root cgroup
> 	3. do anything with the tracee
> 	4. move it back
> 
> > Generally speaking, any process hanging in D-state
> > for a long time isn't the nicest object from the userspace's point of view.
> 
> Roman, this is unfair comparison ;)

Why not? This is exactly the point, with v1 freezer you get a task in D state,
which isn't manageable by userspace without some actions with sysfs.

> 
> > Exactly as a SIGSTOPped process can be killed without sending SIGCONT,
> > why a frozen task would require some additional operations?
> 
> this too,
> 
> > And I'm not talking about the case, when the process which is sending
> > SIGKILL has no write access to cgroupfs.
> 
> True.
> 
> But there is another case. If admin wants to freeze a cgroup then it is not
> clear why a user which can send SIGKILL to a frozen process should wake it up.

But it will woken up only for a short moment. And the cgroup will remain frozen
in a sense that belonging processes do not consume CPU.
Actually, a signal can be useful: for example, if it was the last process,
the container management software might want to delete the cgroup.

> 
> ------------------------------------------------------------------------------
> Again, it is not that I hate the idea of killable/ptraceable freezer. Just I
> personally think it's not worth the trouble. Perhaps I am wrong, but so far
> I do not see a good implementation...
> 
> And, apart from reading/writing the registers, what can ptrace do with a frozen
> tracee? This doesn't look like a "must have" feature to me.

I think the minimal requirement is that the tracing application should not hang
and wait for tracee to be unfrozen.
So, imagine you're trying to debug an application in production with gdb,
and occasionally gdb just hangs because some cluster management stuff froze
the tracee's cgroup. Not the best user experience.

> 
> At least, may I ask you again to make (if possible) a separate patch which adds
> the ability to kill/ptrace?

I'll try, but not sure if it can make the code easier for review.
It looks like this ability defines the implementation.

> ------------------------------------------------------------------------------
> 
> > > Why I am starting to argue... The ability to kill a frozen task complicates
> > > the code, and since cgroup_enter_stopped() (in this version at least) doesn't
> > > properly interacts with freezable_schedule() leads to other problems.
> > >
> > > From 7/7:
> > >
> > > 	+  cgroup.freeze
> > > 	+	A read-write single value file which exists on non-root cgroups.
> > > 	+	Allowed values are "0" and "1". The default is "0".
> > > 	+
> > > 	+	Writing "1" to the file causes freezing of the cgroup and all
> > > 	+	descendant cgroups. This means that all belonging processes will
> > > 	+	be stopped and will not run until the cgroup will be explicitly
> > > 	+	unfrozen. Freezing of the cgroup may take some time;
> > >                                                  ^^^^^^^^^^^^^^^^^^
> > > it may take infinite time.
> > >
> > > Just suppose that a task does vfork() and this races with cgroup_do_freeze(true).
> > > If the new child notices JOBCTL_TRAP_FREEZE before exit/exec the cgroup will be
> > > never frozen.
> >
> > Hm, why? cgroup_update_frozen() called from cgroup_post_fork() should bring
> > the cgroup into the frozen state. If it's not true (I'm missing some race here),
> > it's a bug, but I don't see why it's not possible in general.
> 
> A task P calls vfork() and creates the new child C. Now, how can the parent P
> (which sleeps in TASK_KILLABLE) call cgroup_enter_stopped() ? It can't until C
> exits or execs. C can't exit or exec because it is frozen.

Got it. I'll address it in the next version.

Thanks!