Re: [Bug Report] EBUSY for cgroup rmdir after cgroup.procs empty

"T.J. Mercier" <tjmercier@xxxxxxxxxx> · Fri, 6 Oct 2023 11:37:19 -0700

On Fri, Oct 6, 2023 at 9:58 AM Michal Koutný <mkoutny@xxxxxxxx> wrote:
>
> Hello T.J.
>
> A curious case.
>
> I was staring at the code and any ways occurring to me would imply
> css_set_lock doesn't work.
>
> OTOH, I can bring the reproducer to rmdir()=-EBUSY on my machine
> (6.4.12-1-default) [1].
>
> I notice that there are 2*nr_cpus parallel readers of cgroup.procs.
> And a single thread's testimony is enough to consider cgroup empty.
> Could it be that despite the 200ms delay, some of the threads see the
> cgroup empty _yet_?
> (I didn't do own tracing but by reducing the delay, I could reduce the
> time before EBUSY was hit, otherwise it took several minutes (on top of
> desktop background).)
>
Hm yes, it's possible a thread runs before the child migrates and sets
noProcs = true too early. I added a loop to wait for B to be populated
before running the threads, and now I can't reproduce it. :\

> On Tue, Oct 03, 2023 at 11:01:46AM -0700, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote:
> ...
> > > The trace events look like this when the problem occurs. I'm guessing
> > > the rmdir is attempted in that window between signal_deliver and
> > > cgroup_notify_populated = 0.
>
> But rmdir() happens after empty cgroup.procs was spotted, right?
> (That's why it is curious.)
>
Right, we read cgroup.procs to find which processes to kill. Kill them
all, wait until cgroup.procs is empty, and then attempt to rmdir.

I will try changing the cgroup_rmdir trace event to always fire
instead of only when the rmdir succeeds. That way I can get a more
complete timeline.

> > > However on Android we retry the rmdir for 2 seconds after cgroup.procs
> > > is empty and we're still occasionally hitting the failure. On my
> > > primary phone with 3 days of uptime I see a handful of cases, but the
> > > problem is orders of magnitude worse on Samsung's device.
>
> Would there also be short-lived members of cgroups and reading
> cgroup.procs under load?
>
I think the only short-lived members should be due to launch failures
/ crashes. Reading cgroup.procs does frequently happen under load. One
scenario that comes to mind is under memory pressure where LMKD hunts
for apps to kill (after which their cgroups are removed), while
reclaim and compaction are also occurring.

I suppose it's also possible there is PID reuse by the same app,
causing the cgroup to become repopulated at the same time as a kill,
but that seems extremely unlikely. Plus, at the point where these
kills are occurring we shouldn't normally be simultaneously launching
new processes for the app. Similarly if a process forks right before
it is killed, maybe it doesn't show up in cgroup.procs until after
we've observed it to be empty?

I will investigate some more on a phone where I'm seeing this since my
reproducer isn't doing the trick.

Thanks for taking a look Michal.

>
> Thanks,
> Michal
>
> [1] FTR, a hunk to run it without sudo on a modern desktop:
> -static const std::filesystem::path CG_A_PATH = "/sys/fs/cgroup/A";
> -static const std::filesystem::path CG_B_PATH = "/sys/fs/cgroup/B";
> +static const std::filesystem::path CG_A_PATH = "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/a";
> +static const std::filesystem::path CG_B_PATH = "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/b";
>