On Mon, Mar 12, 2012 at 02:31:55PM -0700, Tejun Heo wrote: > Hello, guys. > > While working on blkcg, I learned that cgroup removal path tries to > drain all internal references synchronously before proceeding with > removal, and may be aborted by by pre_destroy() failing. > > I find both quite unusual. While there are some occassions where we > try to drain reference counts synchronously, the norm is deactivating > the target and then releasing it when the reference count hits zero > and exposing this synchronous behavior directly to userland makes it > worse. This also requires allowing rmdir to be aborted from userland. > > pre_destroy() being allowed to veto rmdir might be okay if there's > only one subsystem using it or it implemented proper > prepare-commit/cancel transaction, but as it currently stands, > pre_destroy() operations are not reversible and even within memcg > itself the state wouldn't be consistent after failure (some moved to > parent while the rest on child). Note that abort from userland also > has the same problem. > > It also complicates and adds even more subtleties to cgroup code. A > lot of it was just me being dumb but making sense of > cgroup_exclude_rmdir() and cgroup_release_and_wakeup_rmdir() usages in > memcontrol took me quite some time even with Hugh's help. > > In general, IMHO, it's a bad idea to expose purely internal > implementation details to userland directly. Internal ref counts can > be kept around for whatever reason (e.g. blkcg does it for lookup > caching), such details shouldn't be visible to userland. Midlayers > like cgroup which sit between userland and mechanism implementations > should provide isolation between the two so that each mechanism > implementation doesn't have to worry about things like that. > > It seems cgroup is going through a lot of the same growing pains that > sysfs went through years ago and would probably benefit from using > sysfs for userland interfacing rather than trying to replicate > features that sysfs already provides. Well, that's another long term > thing, I guess. For now, I'd like to make cgroup rmdir path more > conventional so that rmdir behaves like the following. If you want to spend your time doing archaeology there are some old threads that touch on this idea (roughly around 2003-2005). One point against the idea that I distinctly recall: Somewhat like configfs, object lifetimes in cgroups are determined primarily by the user whereas sysfs object lifetimes are primarily determined by the kernel. I think the closest we come to user-determined objects in sysfs occur through debugfs, and module loading/unloading. However those involve mount/umount and modprobe/rmmod rather than mkdir/rmdir to create and remove the objects. Cheers, -Matt Helsley _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers