On Mon, Nov 18, 2019 at 6:04 PM Prakash Sangappa <prakash.sangappa@xxxxxxxxxx> wrote: > Allow CAP_SYS_NICE to take effect for processes having effective uid of a > root user from init namespace. [...] > @@ -4548,6 +4548,8 @@ int can_nice(const struct task_struct *p, const int nice) > int nice_rlim = nice_to_rlimit(nice); > > return (nice_rlim <= task_rlimit(p, RLIMIT_NICE) || > + (ns_capable(__task_cred(p)->user_ns, CAP_SYS_NICE) && > + uid_eq(current_euid(), GLOBAL_ROOT_UID)) || > capable(CAP_SYS_NICE)); I very strongly dislike tying such a feature to GLOBAL_ROOT_UID. Wouldn't it be better to control this through procfs, similar to uid_map and gid_map? If you really need an escape hatch to become privileged outside a user namespace, then I'd much prefer a file "cap_map" that lets someone with appropriate capabilities in the outer namespace write a bitmask of capabilities that should have effect outside the container, or something like that. And limit that to bits where that's sane, like CAP_SYS_NICE. If we tie features like this to GLOBAL_ROOT_UID, more people are going to run their containers with GLOBAL_ROOT_UID. Which is a terrible, terrible idea. GLOBAL_ROOT_UID gives you privilege over all sorts of files that you shouldn't be able to access, and only things like mount namespaces and possibly LSMs prevent you from exercising that privilege. GLOBAL_ROOT_UID should only ever be given to processes that you trust completely.