Re: [PATCH v2 3/6] cgroup: cgroup v2 freezer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Oleg!

On Tue, Nov 13, 2018 at 04:48:25PM +0100, Oleg Nesterov wrote:
> On 11/12, Roman Gushchin wrote:
> >
> > --- a/include/linux/sched.h
> > +++ b/include/linux/sched.h
> > @@ -83,7 +83,8 @@ struct task_group;
> >  #define TASK_WAKING			0x0200
> >  #define TASK_NOLOAD			0x0400
> >  #define TASK_NEW			0x0800
> > -#define TASK_STATE_MAX			0x1000
> > +#define TASK_FROZEN			0x1000
> > +#define TASK_STATE_MAX			0x2000
> 
> Just noticed the new task state... Why? Can't we avoid it?

We can, but it's nice to show to userspace that tasks are frozen,
rather than just stuck somewhere in the kernel...

> 
> ...
> 
> > +void cgroup_freezer_enter(void)
> > +{
> > +	long state = current->state;
> 
> Why? it must be TASK_RUNNING?
> 
> If not set_current_state() at the end is simply wrong... Yes, __refrigerator()
> does this, but at least it has a reason although it is wrong too.
> 
> > +	struct cgroup *cgrp;
> > +
> > +	if (!current->frozen) {
> > +		spin_lock_irq(&css_set_lock);
> > +		current->frozen = true;
> > +		cgrp = task_dfl_cgroup(current);
> > +		cgrp->freezer.nr_frozen_tasks++;
> > +
> > +		WARN_ON_ONCE(cgrp->freezer.nr_tasks_to_freeze <
> > +			     cgrp->freezer.nr_frozen_tasks);
> > +
> > +		if (cgrp->freezer.nr_tasks_to_freeze ==
> > +		    cgrp->freezer.nr_frozen_tasks)
> > +			cgroup_queue_notify_frozen(cgrp);
> > +		spin_unlock_irq(&css_set_lock);
> > +	}
> > +
> > +	/* refrigerator */
> > +	set_current_state(TASK_WAKEKILL | TASK_INTERRUPTIBLE | TASK_FROZEN);
> 
> Why not __set_current_state() ?

Hm, it's not a hot path at all, so set_current_state() is good enough.
Not a strong preference, of course.

> 
> If ->state include TASK_INTERRUPTIBLE, why do we need TASK_WAKEKILL?
> 
> And again, why TASK_FROZEN?

So, should it be just TASK_INTERRUPTIBLE | TASK_FROZEN ?

> 
> > +	clear_thread_flag(TIF_SIGPENDING);
> > +	schedule();
> > +	recalc_sigpending();
> 
> I simply can't understand these 3 lines above but I bet this is not correct ;)

So, yeah, the problem is that if there is TIF_SIGPENDING bit set, schedule()
will return immediately, so we're getting pretty much a busy loop here.
This is a nasty workaround.

I believe we can clear and not call recalc_sigpending() at all. Does this seem
to be correct?

> 
> if nothing else recalc_sigpending() without ->siglock is wrong, it can race
> with signal_wakeup/etc.
> 
> > +	set_current_state(state);
> 
> see above...

Thank you for the review!
And looking forward for more comments from you!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux