On 08/11/2011 11:09 PM, Serge Hallyn wrote: > Quoting Daniel Lezcano (daniel.lezcano@xxxxxxx): >> When the reboot syscall is called and the pid namespace where the calling >> process belongs to is not from the init pidns, we send a SIGCHLD with CLD_REBOOTED >> to the parent of this pid namespace. >> >> Signed-off-by: Daniel Lezcano <daniel.lezcano@xxxxxxx> > ... > >> +void do_notify_parent_cldreboot(struct task_struct *tsk, int why, char *buffer) >> +{ >> + struct siginfo info = { }; >> + struct task_struct *parent; >> + struct sighand_struct *sighand; >> + unsigned long flags; >> + >> + if (tsk->ptrace) >> + parent = tsk->parent; >> + else { >> + tsk = tsk->group_leader; >> + parent = tsk->real_parent; >> + } >> + >> + info.si_signo = SIGCHLD; >> + info.si_errno = 0; >> + info.si_status = why; >> + >> + rcu_read_lock(); >> + info.si_pid = task_pid_nr_ns(tsk, parent->nsproxy->pid_ns); >> + info.si_uid = __task_cred(tsk)->uid; > > This eventually should become: > > info.si_uid = user_ns_map_uid(task_cred_xxx(t, user_ns), > current_cred(), current_uid()); > > I've got a first-stab patch at converting the rest of > kernel/signal.c in http://kernel.ubuntu.com/git?p=serge/userns-2.6.git Ok, thanks. >> + rcu_read_unlock(); >> + >> + info.si_utime = cputime_to_clock_t(tsk->utime); >> + info.si_stime = cputime_to_clock_t(tsk->stime); >> + >> + info.si_code = CLD_REBOOTED; >> + >> + sighand = parent->sighand; >> + spin_lock_irqsave(&sighand->siglock, flags); >> + if (sighand->action[SIGCHLD-1].sa.sa_handler != SIG_IGN && >> + sighand->action[SIGCHLD-1].sa.sa_flags & SA_CLDREBOOT) >> + __group_send_sig_info(SIGCHLD, &info, parent); >> + /* >> + * Even if SIGCHLD is not generated, we must wake up wait4 calls. >> + */ >> + __wake_up_parent(tsk, parent); >> + spin_unlock_irqrestore(&sighand->siglock, flags); >> +} > ... > >> @@ -426,10 +434,18 @@ SYSCALL_DEFINE4(reboot, int, magic1, int, magic2, unsigned int, cmd, >> { >> char buffer[256]; >> int ret = 0; >> + struct pid_namespace *pid_ns = current->nsproxy->pid_ns; >> + >> + /* We only trust the superuser with rebooting the system. */ >> + if (!capable(CAP_SYS_BOOT)) { > Doesn't this mean that an unprivileged task in a container can shut > down the container? Ha ha ! Right, good catch :) Yes, rethinking about it, we can do what initially proposed Bruno by just preventing to reboot when we are not in the init_pid_ns. Actually, the sys_reboot occurs after the services shutdown and "kill -1 SIGTERM" and "kill -1 SIGKILL", and would not make sense to do that in a child pid namespace, except if we are in a container where we don't want to reboot :) So IMO, it is safe to do: if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT)) return -EPERM; if (pid_ns != &init_pid_ns) return pid_namespace_reboot(pid_ns, cmd, buffer); > The pidns->user_ns patch I sent earlier today gives you what you need > so that you can add > > if (!ns_capable(current_pid_ns()->user_ns, CAP_SYS_BOOT) > return -EPERM; > > right here to prevent that. > >> + /* If we are not in the initial pid namespace, we send a signal >> + * to the parent of this init pid namespace, notifying a shutdown >> + * occured */ >> + if (pid_ns != &init_pid_ns) >> + pid_namespace_reboot(pid_ns, cmd, buffer); >> >> - /* We only trust the superuser with rebooting the system. */ >> - if (!capable(CAP_SYS_BOOT)) >> return -EPERM; >> + } >> >> /* For safety, we require "magic" arguments. */ >> if (magic1 != LINUX_REBOOT_MAGIC1 || >> -- >> 1.7.4.1 >> >> _______________________________________________ >> Containers mailing list >> Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx >> https://lists.linux-foundation.org/mailman/listinfo/containers _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linux-foundation.org/mailman/listinfo/containers