Re: [RFC] cgroup: removing css reference drain wait during cgroup removal

Matt Helsley <matthltc@xxxxxxxxxx> · Tue, 13 Mar 2012 14:45:26 -0700

On Mon, Mar 12, 2012 at 02:31:55PM -0700, Tejun Heo wrote:
> Hello, guys.
> 
> While working on blkcg, I learned that cgroup removal path tries to
> drain all internal references synchronously before proceeding with
> removal, and may be aborted by by pre_destroy() failing.
> 
> I find both quite unusual.  While there are some occassions where we
> try to drain reference counts synchronously, the norm is deactivating
> the target and then releasing it when the reference count hits zero
> and exposing this synchronous behavior directly to userland makes it
> worse.  This also requires allowing rmdir to be aborted from userland.
> 
> pre_destroy() being allowed to veto rmdir might be okay if there's
> only one subsystem using it or it implemented proper
> prepare-commit/cancel transaction, but as it currently stands,
> pre_destroy() operations are not reversible and even within memcg
> itself the state wouldn't be consistent after failure (some moved to
> parent while the rest on child).  Note that abort from userland also
> has the same problem.
> 
> It also complicates and adds even more subtleties to cgroup code.  A
> lot of it was just me being dumb but making sense of
> cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() usages in
> memcontrol took me quite some time even with Hugh's help.
> 
> In general, IMHO, it's a bad idea to expose purely internal
> implementation details to userland directly.  Internal ref counts can
> be kept around for whatever reason (e.g. blkcg does it for lookup
> caching), such details shouldn't be visible to userland.  Midlayers
> like cgroup which sit between userland and mechanism implementations
> should provide isolation between the two so that each mechanism
> implementation doesn't have to worry about things like that.
> 
> It seems cgroup is going through a lot of the same growing pains that
> sysfs went through years ago and would probably benefit from using
> sysfs for userland interfacing rather than trying to replicate
> features that sysfs already provides.  Well, that's another long term
> thing, I guess.  For now, I'd like to make cgroup rmdir path more
> conventional so that rmdir behaves like the following.

If you want to spend your time doing archaeology there are some old threads
that touch on this idea (roughly around 2003-2005). One point against the
idea that I distinctly recall:

Somewhat like configfs, object lifetimes in cgroups are determined
primarily by the user whereas sysfs object lifetimes are primarily
determined by the kernel. I think the closest we come to user-determined
objects in sysfs occur through debugfs, and module loading/unloading.
However those involve mount/umount and modprobe/rmmod rather than
mkdir/rmdir to create and remove the objects.

Cheers,
	-Matt Helsley

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers