Re: [PATCH 3/3 cgroup/for-5.2-fixes] cgroup: Include dying leaders with live threads in PROCS iterations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 H Michal,

On Fri, Jan 10, 2020 at 3:16 PM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello Suren.
>
> On Fri, Jan 10, 2020 at 01:47:19PM -0800, Suren Baghdasaryan <surenb@xxxxxxxxxx> wrote:
> > > On Fri, Jun 07, 2019 at 07:09:53PM +0200, Michal Koutný wrote:
> > > > Wouldn't it make more sense to call
> > > >       css_set_move_task(tsk, cset, NULL, false);
> > > > in cgroup_release instead of cgroup_exit then?
> > > >
> > > > css_set_move_task triggers the cgroup emptiness notification so if we
> > > > list group leaders with running siblings as members of the cgroup (IMO
> > > > correct), is it consistent to deliver the (un)populated event
> > > > that early?
> > > > By moving to cgroup_release we would also make this notification
> > > > analogous to SIGCHLD delivery.
> > >
> > > So, the current behavior is mostly historical and I don't think this
> > > is a better behavior.  That said, I'm not sure about the cost benefit
> > > ratio of changing the behavior at this point given how long the code
> > > has been this way.  Another aspect is that it'd expose zombie tasks
> > > and possibly migrations of them to each controller.
> >
> > Sorry for reviving an old discussion
> Since you reply to my remark, I have to share that I found myself wrong
> later wrt the emptiness notifications. Moving the task in cgroup_exit
> doesn't matter if thread group contains other live tasks, the unpopulated
> notification will be raised when the last task of thread group calls
> group_exit, i.e. it is similar to SIGHLD.
>
> Now to your issue.
>
> > but I got a bug report from a customer about Android
> What kernel version is that? Can be it considered equal to the current
> Linux?

I reproduced this with 4.19 kernel that had Tejun's patches backported
but I believe the same behavior will happen with any kernel that
contains c03cd7738a83 ("cgroup: Include dying leaders with live
threads in PROCS iterations") including the latest. I'll try to
confirm this on 5.4 next week.

>
> > In my testing I was able to reproduce the failure in
> > which case rmdir() fails with EBUSY from cgroup_destroy_locked()
> > because cgroup_is_populated() returns true.
> That'd mean that not all tasks in the cgroup did exit() (cgroup_exit()),
> i.e. they're still running. Can you see them in cgroup.threads/tasks?

Correct, I confirmed that there are still tasks in cset->tasks list
when this happens. Actually when I reproduced this I logged one task
in cset->dying_tasks which was the group leader and another one in
cset->tasks. So the last task to reach cgroup_exit() was not the group
leader.
However these tasks will be ignored when we enumerate them using
css_task_iter_start()/css_task_iter_next() loop because
css_task_iter_advance() ignores tasks with PF_EXITING flag set or
group leaders with task->signal->live==0 (see the code lines I posted
in my first email). So threads are there, they are PF_EXITING but at
least one of them did not reach cgroup_exit() yet. When user space
reads cgroup.procs/tasks they are skipped. This is the change in
behavior which is different from what it used to be before this patch.
Prior to it these tasks would be visible even if they are exiting.

>
> > cgroup_is_populated() depends on cgrp->nr_populated_xxx counters which
> > IIUC will not be updated until cgroup_update_populated() is called
> > from cgroup_exit() and that might get delayed...
> Why do you think it's delayed?

I think exit_mm() can be one of the contributors if the process owns a
considerable amount of memory but there might be other possible
reasons.

> > So from user space's point of view the cgroup is empty and can be
> > removed but rmdir() fails to do so.
> As Tejun says, cgroup with only dead _tasks_ should be removable (and if
> I'm not mistaken it is in the current kernel). Unless you do individual
> threads migration when a thread would be separated from its (dead)
> leader. Does your case include such migrations?
>
> Michal

Thanks,
Suren.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux