On Fri, Jul 08, 2016 at 07:08:10PM +0800, Zhao Lei wrote: > Currently when we set core_pattern to a pipe, the pipe program is > forked by kthread running with root's permission, and write dumpfile > into host's filesystem. > Same thing happened for container, the dumper and dumpfile are also > in host(not in container). > > It have following program: > 1: Not consistent with file_type core_pattern > When we set core_pattern to a file, the container will write dump > into container's filesystem instead of host. > 2: Not safe for privileged container > In a privileged container, user can destroy host system by following > command: > # # In a container > # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern > # make_dump > > This patch switch dumper program's permission to init task, it is to say, > for container, dumper program have same permission with init task in > container, which make dumper program put in container's filesystem, and > write coredump into container's filesystem. > The dumper's permission is also limited into subset of container's init > process. > > See following discussion for detail: > http://www.gossamer-threads.com/lists/linux/kernel/2455363 > http://www.gossamer-threads.com/lists/linux/kernel/2397602 > > Todo: > 1: Set different core_pattern to host and each container. > 2: Keep compatibility with current design > All above are done in previous version of this patch, > need some restruct. > > Any suggestion is welcome for this patch. Ok, so those two todo items are I think absolute musts before we can consider any of this. Changing the default behavior that we have had ever since pipe in core_pattern and namespaces have existed would cause serious problem for existing users. Speaking for myself, LXC does setup limitations on privileged containers which prevent them from changing the core_pattern so that it can't be abused for privileged containers. The core_pattern on Ubuntu then sends all crashes to apport, as root, on the host. Apport is then container aware and handles forwarding the crash (through a unix socket) to the apport process inside the container. Your suggested change would likely make this all fail as apport on the host must be called as root to be able to access the container's filesystem (through /proc/<PID>/root). With namespacing in place, then we wouldn't need the apport relay trick on the host, so that'd be fine. And otherwise, making this behavior optional (different pattern or something) would be fine too (not breaking existing users). > > Suggested-by: Eric W. Biederman <ebiederm@xxxxxxxxxxxx> > Suggested-by: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> > > Signed-off-by: Zhao Lei <zhaolei@xxxxxxxxxxxxxx> > --- > fs/coredump.c | 84 ++++++++++++++++++++++++++++++++++++++++++++++++- > include/linux/binfmts.h | 1 + > 2 files changed, 84 insertions(+), 1 deletion(-) > > diff --git a/fs/coredump.c b/fs/coredump.c > index 281b768..92dc343 100644 > --- a/fs/coredump.c > +++ b/fs/coredump.c > @@ -516,6 +516,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) > { > struct file *files[2]; > struct coredump_params *cp = (struct coredump_params *)info->data; > + struct task_struct *base_task; > + > int err = create_pipe_files(files, 0); > if (err) > return err; > @@ -524,10 +526,78 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) > > err = replace_fd(0, files[0], 0); > fput(files[0]); > + if (err) > + return err; > + > /* and disallow core files too */ > current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; > > - return err; > + base_task = cp->base_task; > + if (base_task) { > + const struct cred *base_cred; > + > + /* Set fs_root to base_task */ > + spin_lock(&base_task->fs->lock); > + set_fs_root(current->fs, &base_task->fs->root); > + spin_unlock(&base_task->fs->lock); > + > + /* Set namespaces to base_task */ > + switch_task_namespaces(current, base_task->nsproxy); > + > + /* Set cgroup to base_task */ > + current->flags &= ~PF_NO_SETAFFINITY; > + err = cgroup_attach_task_all(base_task, current); > + if (err < 0) > + return err; > + > + /* Set cred to base_task */ > + base_cred = get_task_cred(base_task); > + > + new->uid = base_cred->uid; > + new->gid = base_cred->gid; > + new->suid = base_cred->suid; > + new->sgid = base_cred->sgid; > + new->euid = base_cred->euid; > + new->egid = base_cred->egid; > + new->fsuid = base_cred->fsuid; > + new->fsgid = base_cred->fsgid; > + > + new->securebits = base_cred->securebits; > + > + new->cap_inheritable = base_cred->cap_inheritable; > + new->cap_permitted = base_cred->cap_permitted; > + new->cap_effective = base_cred->cap_effective; > + new->cap_bset = base_cred->cap_bset; > + new->cap_ambient = base_cred->cap_ambient; > + > + security_cred_free(new); > +#ifdef CONFIG_SECURITY > + new->security = NULL; > +#endif > + err = security_prepare_creds(new, base_cred, GFP_KERNEL); > + if (err < 0) { > + put_cred(base_cred); > + return err; > + } > + > + free_uid(new->user); > + new->user = base_cred->user; > + get_uid(new->user); > + > + put_user_ns(new->user_ns); > + new->user_ns = base_cred->user_ns; > + get_user_ns(new->user_ns); > + > + put_group_info(new->group_info); > + new->group_info = base_cred->group_info; > + get_group_info(new->group_info); > + > + put_cred(base_cred); > + > + validate_creds(new); > + } > + > + return 0; > } > > void do_coredump(const siginfo_t *siginfo) > @@ -590,6 +660,7 @@ void do_coredump(const siginfo_t *siginfo) > > if (ispipe) { > int dump_count; > + struct task_struct *vinit_task; > char **helper_argv; > struct subprocess_info *sub_info; > > @@ -631,6 +702,12 @@ void do_coredump(const siginfo_t *siginfo) > goto fail_dropcount; > } > > + vinit_task = find_task_by_vpid(1); > + if (!vinit_task) { > + printk(KERN_WARNING "failed getting init task info, skipping core dump\n"); > + goto fail_dropcount; > + } > + > helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL); > if (!helper_argv) { > printk(KERN_WARNING "%s failed to allocate memory\n", > @@ -638,6 +715,10 @@ void do_coredump(const siginfo_t *siginfo) > goto fail_dropcount; > } > > + get_task_struct(vinit_task); > + > + cprm.base_task = vinit_task; > + > retval = -ENOMEM; > sub_info = call_usermodehelper_setup(helper_argv[0], > helper_argv, NULL, GFP_KERNEL, > @@ -646,6 +727,7 @@ void do_coredump(const siginfo_t *siginfo) > retval = call_usermodehelper_exec(sub_info, > UMH_WAIT_EXEC); > > + put_task_struct(vinit_task); > argv_free(helper_argv); > if (retval) { > printk(KERN_INFO "Core dump to |%s pipe failed\n", > diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h > index 314b3ca..0c9a72c 100644 > --- a/include/linux/binfmts.h > +++ b/include/linux/binfmts.h > @@ -59,6 +59,7 @@ struct linux_binprm { > > /* Function parameter for binfmt->coredump */ > struct coredump_params { > + struct task_struct *base_task; > const siginfo_t *siginfo; > struct pt_regs *regs; > struct file *file; > -- > 1.8.5.1 > > > > _______________________________________________ > Containers mailing list > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx > https://lists.linuxfoundation.org/mailman/listinfo/containers -- Stéphane Graber Ubuntu developer http://www.ubuntu.com
Attachment:
signature.asc
Description: PGP signature
_______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers