Re: [Bug Report] EBUSY for cgroup rmdir after cgroup.procs empty

"T.J. Mercier" <tjmercier@xxxxxxxxxx> · Wed, 11 Oct 2023 16:57:49 -0700

On Tue, Oct 10, 2023 at 10:14 AM T.J. Mercier <tjmercier@xxxxxxxxxx> wrote:
>
> On Tue, Oct 10, 2023 at 9:31 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
> >
> > On Fri, Oct 06, 2023 at 11:37:19AM -0700, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote:
> > > I suppose it's also possible there is PID reuse by the same app,
> > > causing the cgroup to become repopulated at the same time as a kill,
> > > but that seems extremely unlikely. Plus, at the point where these
> > > kills are occurring we shouldn't normally be simultaneously launching
> > > new processes for the app. Similarly if a process forks right before
> > > it is killed, maybe it doesn't show up in cgroup.procs until after
> > > we've observed it to be empty?
> >
> > Something like this:
> >
> >                                                         kill (before)
> > cgroup_fork
> > cgroup_can_fork .. begin(threadgroup_rwsem)
> > tasklist_lock
> > fatal_signal_pending -> cgroup_cancel_fork              kill (mid)
> > tasklist_unlock
> >                                                         seq_start,
> >                                                         seq_next...
> >
> > cgroup_post_fork  .. end(threadgroup_rwsem)
> >                                                         kill (after)
> >
> > Only the third option `kill (after)` means the child would end up on the
> > css_set list. But that would mean the reader squeezed before
> > cgroup_post_fork() would still see the parent.
> > (I.e. I don't see the kill/fork race could skew the listed procs.)
> >
> So here is a trace from a phone where the kills happen (~100ms) after
> the forks. All but one of the children die before we read cgroup.procs
> for the first time, and cgroup.procs is not empty. 5ms later we read
> again and cgroup.procs is empty, but the last child still hasn't
> exited. So it makes sense that the cset from that last child is still
> on the list.
> https://pastebin.com/raw/tnHhnZBE
>
Collected a bit more info. It's before exit_mm that the process
disappears from cgroup.procs, but the delay to populated=0 seems to be
exacerbated by CPU contention during this time. What's weird is that I
can see the task blocking the rmdir on cgrp->cset_links->cset->tasks
inside of cgroup_destroy_locked when the rmdir is attempted, so I
don't understand why it doesn't show up when iterating tasks for
cgroup.procs.

I'm going to be out until next Wednesday when I'll look some more.