Hello, On Mon, Sep 13, 2021 at 04:20:59PM +0200, Christian Brauner wrote: > Afaict, there is currently now way to prevent the deletion of empty > cgroups, especially newly created ones. So for example, if I have a > cgroup manager that prunes the cgroup tree whenever they detect empty > cgroups they can delete cgroups that were pre-allocated. This is > something we have run into before. systemd doesn't mess with cgroups behind a delegation point. > A related problem is a crashed or killed container manager > (segfault, sigkill, etc.). It might not have had the chance to cleanup > cgroups it allocated for the container. If the container manager is > restarted it can't reuse the existing cgroup it found because it has no > way of guaranteeing whether in between the time it crashed and got > restarted another program has just created a cgroup with the same name. > We usually solve this by just creating another cgroup with an index > appended until we we find an unallocated one setting an arbitrary cut > off point until we require manual intervention by the user (e.g. 1000). > > Right now iirc, one can rmdir() an empty cgroup while someone still > holds a file descriptor open for it. This can lead to situation where a > cgroup got created but before moving into the cgroup (via clone3() or > write()) someone else has deleted it. What would already be helpful is > if one had a way to prevent the deletion of cgroups when someone still > has an open reference to it. This would allow a pool of cgroups to be > created that can't simply be deleted. The above are problems common for any entity managing cgroup hierarchy. Beyond the permission and delegation based access control, cgroup doesn't have a mechanism to grant exclusive managerial operations to a specific application. It's the userspace's responsibility to coordinate these operations like in most other kernel interfaces. Thanks. -- tejun