On Tue, Jul 02, 2013 at 09:35:39AM -0700, Eric W. Biederman wrote: > Gao feng <gaofeng@xxxxxxxxxxxxxx> writes: > > > On 07/02/2013 05:57 PM, Eric W. Biederman wrote: > >> "Daniel P. Berrange" <berrange@xxxxxxxxxx> writes: > >> > >>> On Tue, Jul 02, 2013 at 10:56:37AM +0200, Richard Weinberger wrote: > >>>> Am 02.07.2013 10:44, schrieb Eric W. Biederman: > >>>>> Gao feng <gaofeng@xxxxxxxxxxxxxx> writes: > >>>>> > >>>>>> On 07/02/2013 12:16 AM, Daniel P. Berrange wrote: > >>>>>>> I'm struggling debugging a strange problem with interaction between user > >>>>>>> namespaces, cap_set and ownership of files in /proc/1/ > >>>>>>> > >>>>>> > >>>>>> This problem is occured after we call setuid/gid. > >>>>>> > >>>>>> for example, a task whose pid is 1234 calls > >>>>>> setregid(10,10); > >>>>>> setreuid(10,10); > >>> > >>> If seems to get reset to the right values (0:0) when we execve() > >>> the init binary though. This doesn't happen if we have invoked > >>> the capset() syscall in between the setregid & the execve() calls. > >> > >> Yes, execve() should reset the dumpable state. > >> > >> I took a quick look and I don't see a way around set_dumpable calls in > >> setup_new_exec. Why the process remains undumpable after exec is worth > >> investigating. That logic should not be user namespace specific > >> however. > >> > > > > I think it's the install_exec_creds, it calls commit_creds to set process undumpable > > > > /* dumpability changes */ > > if (!uid_eq(old->euid, new->euid) || > > !gid_eq(old->egid, new->egid) || > > !uid_eq(old->fsuid, new->fsuid) || > > !gid_eq(old->fsgid, new->fsgid) || > > !cred_cap_issubset(old, new)) { > > if (task->mm) > > set_dumpable(task->mm, suid_dumpable); > > task->pdeath_signal = 0; > > smp_wmb(); > > } > > That looks like it could do it. Especially if exec is increasing your > capabilities. Ah, yes, that would explain it. My demo is removing the SYS_MODULE capability, and then exec'ing the shell binary. Since we are uid==0, and prctl(PR_CAPBSET_DROP) is not available inside the user namespace, the rules for capabilities vs execve() call will cause the shell binary to regain SYS_MODULE capability bit. So the problem I'm seeing in libvirt is all a result of the fact that we can't use PR_CAPBSET_DROP inside the user namespace. Given that there's no point trying to drop any capabilities inside the user namespace. The only slight problem here is that we want to drop CAP_MKNOD so that systemd can detect that it shouldn't attempt to run any units which would rely on mknod. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers