Re: [Bug Report] EBUSY for cgroup rmdir after cgroup.procs empty

Michal Koutný <mkoutny@xxxxxxxx> · Fri, 6 Oct 2023 18:58:46 +0200

Hello T.J.

A curious case.

I was staring at the code and any ways occurring to me would imply
css_set_lock doesn't work.

OTOH, I can bring the reproducer to rmdir()=-EBUSY on my machine
(6.4.12-1-default) [1].

I notice that there are 2*nr_cpus parallel readers of cgroup.procs.
And a single thread's testimony is enough to consider cgroup empty.
Could it be that despite the 200ms delay, some of the threads see the
cgroup empty _yet_?
(I didn't do own tracing but by reducing the delay, I could reduce the
time before EBUSY was hit, otherwise it took several minutes (on top of
desktop background).)

On Tue, Oct 03, 2023 at 11:01:46AM -0700, "T.J. Mercier" <tjmercier@xxxxxxxxxx> wrote:
...
> > The trace events look like this when the problem occurs. I'm guessing
> > the rmdir is attempted in that window between signal_deliver and
> > cgroup_notify_populated = 0.

But rmdir() happens after empty cgroup.procs was spotted, right?
(That's why it is curious.)

> > However on Android we retry the rmdir for 2 seconds after cgroup.procs
> > is empty and we're still occasionally hitting the failure. On my
> > primary phone with 3 days of uptime I see a handful of cases, but the
> > problem is orders of magnitude worse on Samsung's device.

Would there also be short-lived members of cgroups and reading
cgroup.procs under load?

Thanks,
Michal

[1] FTR, a hunk to run it without sudo on a modern desktop:
-static const std::filesystem::path CG_A_PATH = "/sys/fs/cgroup/A";
-static const std::filesystem::path CG_B_PATH = "/sys/fs/cgroup/B";
+static const std::filesystem::path CG_A_PATH = "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/a";
+static const std::filesystem::path CG_B_PATH = "/sys/fs/cgroup/user.slice/user-1000.slice/user@1000.service/app.slice/b";

Attachment:
signature.asc

Description: PGP signature