On Mon, Oct 17, 2016 at 3:35 PM, John Stultz <john.stultz@xxxxxxxxxx> wrote: > This patch adds CAP_GROUP_MIGRATE and logic to allows a process > to migrate other tasks between cgroups. > > In Android (where this feature originated), the ActivityManager tracks > various application states (TOP_APP, FOREGROUND, BACKGROUND, SYSTEM, > etc), and then as applications change states, the SchedPolicy logic > will migrate the application tasks between different cgroups used > to control the different application states (for example, there is a > background cpuset cgroup which can limit background tasks to stay > on one low-power cpu, and the bg_non_interactive cpuctrl cgroup can > then further limit those background tasks to a small percentage of > that one cpu's cpu time). > > However, for security reasons, Android doesn't want to make the > system_server (the process that runs the ActivityManager and > SchedPolicy logic), run as root. So in the Android common.git > kernel, they have some logic to allow cgroups to loosen their > permissions so CAP_SYS_NICE tasks can migrate other tasks between > cgroups. > > The approach taken there overloads CAP_SYS_NICE a bit much, and > is maybe more complicated then needed. > > So this patch, as suggested by Tejun, simply adds a new process > capability flag (CAP_CGROUP_MIGRATE), and uses it when checking > if a task can migrate other tasks between cgroups. > > I've tested this with AOSP master (though its a bit hacked in as I > still need to properly get the selinux bits aware of the new > capability bit) with selinux set to permissive and it seems to be > working well. > > Thoughts and feedback would be appreciated! > > Cc: Tejun Heo <tj@xxxxxxxxxx> > Cc: Li Zefan <lizefan@xxxxxxxxxx> > Cc: Jonathan Corbet <corbet@xxxxxxx> > Cc: cgroups@xxxxxxxxxxxxxxx > Cc: Android Kernel Team <kernel-team@xxxxxxxxxxx> > Cc: Rom Lemarchand <romlem@xxxxxxxxxxx> > Cc: Colin Cross <ccross@xxxxxxxxxxx> > Cc: Dmitry Shmidt <dimitrysh@xxxxxxxxxx> > Cc: Ricky Zhou <rickyz@xxxxxxxxxxxx> > Cc: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx> > Cc: Todd Kjos <tkjos@xxxxxxxxxx> > Cc: Christian Poetzsch <christian.potzsch@xxxxxxxxxx> > Cc: Amit Pundir <amit.pundir@xxxxxxxxxx> > Cc: Serge E. Hallyn <serge@xxxxxxxxxx> > Cc: linux-api@xxxxxxxxxxxxxxx > Signed-off-by: John Stultz <john.stultz@xxxxxxxxxx> > --- > v2: Renamed to just CAP_CGROUP_MIGRATE as reccomended by Tejun > --- > include/uapi/linux/capability.h | 5 ++++- > kernel/cgroup.c | 3 ++- > 2 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/include/uapi/linux/capability.h b/include/uapi/linux/capability.h > index 49bc062..44d7ff4 100644 > --- a/include/uapi/linux/capability.h > +++ b/include/uapi/linux/capability.h > @@ -349,8 +349,11 @@ struct vfs_cap_data { > > #define CAP_AUDIT_READ 37 > > +/* Allow migrating tasks between cgroups */ > > -#define CAP_LAST_CAP CAP_AUDIT_READ > +#define CAP_CGROUP_MIGRATE 38 > + > +#define CAP_LAST_CAP CAP_CGROUP_MIGRATE > > #define cap_valid(x) ((x) >= 0 && (x) <= CAP_LAST_CAP) > > diff --git a/kernel/cgroup.c b/kernel/cgroup.c > index 85bc9be..09f84d2 100644 > --- a/kernel/cgroup.c > +++ b/kernel/cgroup.c > @@ -2856,7 +2856,8 @@ static int cgroup_procs_write_permission(struct task_struct *task, > */ > if (!uid_eq(cred->euid, GLOBAL_ROOT_UID) && > !uid_eq(cred->euid, tcred->uid) && > - !uid_eq(cred->euid, tcred->suid)) > + !uid_eq(cred->euid, tcred->suid) && > + !ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)) > ret = -EACCES; This logic seems rather confused to me. Without this patch, a user can write to procs if it's root *or* it matches the target uid *or* it matches the target suid. How does this make sense? How about ptrace_may_access(...) || ns_capable(tcred->user_ns, CAP_CGROUP_MIGRATE)? --Andy -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html