On Thu, Nov 3, 2016 at 9:30 AM, Lafcadio Wluiki <wluikil@xxxxxxxxx> wrote: > (Third, rebased submission, since first two submissions yielded no replies.) > > This adds a new per-task hidepid= flag that is honored by procfs when > presenting /proc to the user, in addition to the existing hidepid= mount > option. So far, hidepid= was exclusively a per-pidns setting. Locking > down a set of processes so that they cannot see other user's processes > without affecting the rest of the system thus currently requires > creation of a private PID namespace, with all the complexity it brings, > including maintaining a stub init process as PID 1 and losing the > ability to see processes of the same user on the rest of the system. > > With this patch all acesss and visibility checks in procfs now > honour two fields: > > a) the existing hide_pid field in the PID namespace > b) the new hide_pid in struct task_struct > > Access/visibility is only granted if both fields permit it; the more > restrictive one wins. By default the new task_struct hide_pid value > defaults to 0, which means behaviour is not changed from the status quo. > > Setting the per-process hide_pid value is done via a new PR_SET_HIDEPID > prctl() option which takes the same three supported values as the > hidepid= mount option. The per-process hide_pid may only be increased, > never decreased, thus ensuring that once applied, processes can never > escape such a hide_pid jail. When a process forks it inherits its > parent's hide_pid value. > > Suggested usecase: let's say nginx runs as user "www-data". After > dropping privileges it may now call: > > … > prctl(PR_SET_HIDEPID, 2); > … > > And from that point on neither nginx itself, nor any of its child > processes may see processes in /proc anymore that belong to a different > user than "www-data". Other services running on the same system remain > unaffected. > > This should permit Linux distributions to more comprehensively lock down > their services, as it allows an isolated opt-in for hidepid= for > specific services. Previously hidepid= could only be set system-wide, > and then specific services had to be excluded by group membership, > essentially a more complex concept of opt-out. > > A test-tool that validates this functionality is available here: > > https://paste.fedoraproject.org/412975/71967605/ > > Signed-off-by: Lafcadio Wluiki <wluikil@xxxxxxxxx> I like this idea: it meaningfully reduces attack surface, even though it doesn't ensure the same confinement as a pid namespace (e.g. a process with this prctl set can still direct syscalls to pids that are hidden). However, the attack surface in /proc is relatively large compared to the syscalls that use pids. For task launchers, this may overlap nicely with no_new_privs too. Some suggestions/nits below... > --- > fs/proc/array.c | 3 +++ > fs/proc/base.c | 6 ++++-- > include/linux/init_task.h | 1 + > include/linux/sched.h | 1 + > include/uapi/linux/prctl.h | 4 ++++ > kernel/fork.c | 1 + > kernel/sys.c | 10 ++++++++++ > 7 files changed, 24 insertions(+), 2 deletions(-) > > diff --git a/fs/proc/array.c b/fs/proc/array.c > index 81818ad..ea801e5 100644 > --- a/fs/proc/array.c > +++ b/fs/proc/array.c > @@ -163,6 +163,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, > const struct cred *cred; > pid_t ppid, tpid = 0, tgid, ngid; > unsigned int max_fds = 0; > + int hide_pid; > > rcu_read_lock(); > ppid = pid_alive(p) ? > @@ -183,6 +184,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, > task_lock(p); > if (p->files) > max_fds = files_fdtable(p->files)->max_fds; > + hide_pid = p->hide_pid; > task_unlock(p); > rcu_read_unlock(); > > @@ -201,6 +203,7 @@ static inline void task_state(struct seq_file *m, struct pid_namespace *ns, > seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->egid)); > seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->sgid)); > seq_put_decimal_ull(m, "\t", from_kgid_munged(user_ns, cred->fsgid)); > + seq_put_decimal_ull(m, "\nHidePID:\t", hide_pid); > seq_put_decimal_ull(m, "\nFDSize:\t", max_fds); > > seq_puts(m, "\nGroups:\t"); This should get an addition to table 1-2 of Documentation/filesystems/proc.txt that covers the contents of /proc/$pid/status > diff --git a/fs/proc/base.c b/fs/proc/base.c > index ae5e13c..6c9a42b 100644 > --- a/fs/proc/base.c > +++ b/fs/proc/base.c > @@ -709,7 +709,8 @@ static bool has_pid_permissions(struct pid_namespace *pid, > struct task_struct *task, > int hide_pid_min) > { > - if (pid->hide_pid < hide_pid_min) > + if (pid->hide_pid < hide_pid_min && > + current->hide_pid < hide_pid_min) > return true; > if (in_group_p(pid->pid_gid)) > return true; > @@ -730,7 +731,8 @@ static int proc_pid_permission(struct inode *inode, int mask) > put_task_struct(task); > > if (!has_perms) { > - if (pid->hide_pid == HIDEPID_INVISIBLE) { > + if (pid->hide_pid == HIDEPID_INVISIBLE || > + current->hide_pid == HIDEPID_INVISIBLE) { > /* > * Let's make getdents(), stat(), and open() > * consistent with each other. If a process Instead of open-coding both of these "||" tests, I think it might be cleaner to just choose the highest protection value, and use that in both comparisons. e.g. int hide_pid = max(pid->hide_pid, current->hide_pid); if (hide_pid < hide_pid_min) ... 9if (hide_pid == HIDEPID_INVISIBLE) ... > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index 325f649..c87de0e 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -250,6 +250,7 @@ extern struct task_group root_task_group; > .cpu_timers = INIT_CPU_TIMERS(tsk.cpu_timers), \ > .pi_lock = __RAW_SPIN_LOCK_UNLOCKED(tsk.pi_lock), \ > .timer_slack_ns = 50000, /* 50 usec default slack */ \ > + .hide_pid = 0, \ > .pids = { \ > [PIDTYPE_PID] = INIT_PID_LINK(PIDTYPE_PID), \ > [PIDTYPE_PGID] = INIT_PID_LINK(PIDTYPE_PGID), \ > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 348f51b..3e8ca16 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -1572,6 +1572,7 @@ struct task_struct { > /* unserialized, strictly 'current' */ > unsigned in_execve:1; /* bit to tell LSMs we're in execve */ > unsigned in_iowait:1; > + unsigned hide_pid:2; /* per-process procfs hidepid= */ > #if !defined(TIF_RESTORE_SIGMASK) > unsigned restore_sigmask:1; > #endif > diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h > index a8d0759..ada62b6 100644 > --- a/include/uapi/linux/prctl.h > +++ b/include/uapi/linux/prctl.h > @@ -197,4 +197,8 @@ struct prctl_mm_map { > # define PR_CAP_AMBIENT_LOWER 3 > # define PR_CAP_AMBIENT_CLEAR_ALL 4 > > +/* Per process, non-revokable procfs hidepid= option */ > +#define PR_SET_HIDEPID 48 > +#define PR_GET_HIDEPID 49 > + > #endif /* _LINUX_PRCTL_H */ > diff --git a/kernel/fork.c b/kernel/fork.c > index 623259f..d4fe951 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1562,6 +1562,7 @@ static __latent_entropy struct task_struct *copy_process( > #endif > > p->default_timer_slack_ns = current->timer_slack_ns; > + p->hide_pid = current->hide_pid; > > task_io_accounting_init(&p->ioac); > acct_clear_integrals(p); > diff --git a/kernel/sys.c b/kernel/sys.c > index 89d5be4..c0a1d3e 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -2270,6 +2270,16 @@ SYSCALL_DEFINE5(prctl, int, option, unsigned long, arg2, unsigned long, arg3, > case PR_GET_FP_MODE: > error = GET_FP_MODE(me); > break; > + case PR_SET_HIDEPID: > + if (arg2 < HIDEPID_OFF || arg2 > HIDEPID_INVISIBLE) > + return -EINVAL; > + if (arg2 < me->hide_pid) > + return -EPERM; > + me->hide_pid = arg2; > + break; > + case PR_GET_HIDEPID: > + error = put_user((int) me->hide_pid, (int __user *)arg2); > + break; > default: > error = -EINVAL; > break; > -- > 2.7.4 > Since this adds a new prctl interface, it's best to Cc linux-arch (which I added now). Thanks for proposing this! -Kees -- Kees Cook Nexus Security -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html