On Tue, Jun 2, 2015 at 10:50 PM, Tejun Heo <tj@xxxxxxxxxx> wrote: > Hello, John. > > On Tue, Jun 02, 2015 at 12:07:23PM -0700, John Stultz wrote: >> On Wed, May 20, 2015 at 8:41 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: >> > As a heads up, this is just a first RFC and not a submission. >> > >> > Android currently loosens the cgroup attchment permissions, allowing >> > tasks with CAP_SYS_NICE to be able to allow tasks to move arbitrary >> > tasks across cgroups. >> > >> > At first glance, overloading CAP_SYS_NICE seems a bit hackish, but this >> > shows that there is a active and widely deployed use for different cgroup >> > attachment rules then what is currently available. >> > >> > I've tried to rework the patches so this attachment policy is build >> > time configurable, and wanted to send them out for review so folks >> > might give their thoughts on this implementation and what they might >> > see as a better way to go about achieving the same goal. >> > >> > Thoughts and feedback would be appriciated! >> >> Ping? Not sure if I hit folks at a busy time or if I didn't cc the right folks? > > Can you explain why this is necessary? android's cgroups usage is > pretty weird at this point. IIRC, it moves tasks between fg and bg > cgroups with memory immigration turned on which basically forces memcg > to walk every single page relabeling them on each fg/bg switch, which > is a pretty silly thing to do and for this android kernel also removes > a rcu sync operation which makes break things on removal of cgroups, > which android doesn't do. So Colin and Rom could likely do a better job explaining this then me, but my understanding is that from the scheduling side, they use cgroups to bound cpu usage of tasks in various states (ie: foreground applications, background applications, audio application, system audio, and system tasks). This allows for things like audio applications to be SCHED_FIFO but not run-away hogging infinite cpu, and background task cpu usage to be similarly limited. The migration of a task from the foreground to background, or to elevate a task to audio priority, may be done by system service that does not run as root. So this patch allows processes with CAP_SYS_NICE to be able to migrate tasks between cgroups. I suspect if there was a specific cap (CAP_SYS_CHANGE_CGROUP) for this, it would be usable here, but they just overloaded CAP_SYS_NICE. One example of this: https://android.googlesource.com/platform/frameworks/base/+/b267554/services/java/com/android/server/os/SchedulingPolicyService.java As for the move_charge_at_immigrate, that does seem to be true, and I'm not sure at this moment of the details (Rom, Colin?), but I suspect it has to do with the memgc mempressure notifiers. And at least in the current android-3.18 kernel, I don't see any rcu sync modifications. But maybe I'm missing what you mean? > memcg usage came up a while ago and there wasn't anything major which > can't be achieved (usually better) by following more standard cgroup > usage - changing knobs rather than moving tasks around. Do you have a pointer to that discussion or maybe even just a sense of who was involved so I can trawl the list and better understand it? > Given that, I'm not sure this patchset makes sense for upstream. Right, I'm not suggesting the patch go in "as-is". I just wanted to show what Android is currently using, and start a discussion of what would be a better approach for Android to use, or what the kernel might need to be able to support Android's use case. thanks -john -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html