On Fri, Apr 26, 2013 at 2:54 AM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > richard -rw- weinberger <richard.weinberger@xxxxxxxxx> writes: > >> On Wed, Mar 27, 2013 at 10:26 PM, Michael Kerrisk (man-pages) >> <mtk.manpages@xxxxxxxxx> wrote: >>> Inside the user namespace, the shell has user and group ID 0, >>> and a full set of permitted and effective capabilities: >>> >>> bash$ cat /proc/$$/status | egrep '^[UG]id' >>> Uid: 0 0 0 0 >>> Gid: 0 0 0 0 >>> bash$ cat /proc/$$/status | egrep '^Cap(Prm|Inh|Eff)' >>> CapInh: 0000000000000000 >>> CapPrm: 0000001fffffffff >>> CapEff: 0000001fffffffff >> >> I've tried your demo program, but inside the new ns I'm automatically nobody. >> As Eric said, setuid(0)/setgid(0) are missing. > > Is it the setuid/setgid or not setting up the uid/gid map? uid/git mapping are set up. >> Eric, maybe you can help me. How can I drop capabilities within a user >> namespace? > >> In childFunc() I did add prctl(PR_CAPBSET_DROP, CAP_NET_ADMIN) but it always >> returns ENOPERM. >> What that? I thought I get a completely fresh set of cap which I can modify. >> I don't want that uid 0 inside the container has all caps. > > There are weird things that happen with exec and the user namespace. If > you have exec'd as an unmapped user all of your capabilities have > already been droped. I've setup the mappings. If I look into /proc/*/status I see that my process has all caps. So, in general it is possible to drop cap within a user namespace? I really want to drop CAP_NET_ADMIN and some others. root within my container must not change any networking settings. >> And why does /proc/*/loginuid always contain 4294967295 in a new user namespace? >> Writing to it also fails. (Noticed that because pam_loginuid.so does not work). > > Almost certainly because the loginuid has already been set. Yes. It > looks like I am simply using from_kuid instead of from_kuid_munged on > the read. So an unmapped loginuid will be reported as 4294967295. > > For some circumstances 65534 (nobody) is definitely better in some it is > a toss up, and most of the time no one really cares. So I have tried to > do something but in this case I don't know which was the best policy. Hmm, I hoped that loginuid will be reset upon entering a user namespace. >> Final question, is it by design that uid 0 within a namespace in not >> allowed to write to >> /proc/*/oom_score_adj? > > Essentially. It is by design that uid 0 within a namespace be mapped to > some other uid outside the namespace, and that the permissions on writes > should use the permission needed outside of the user namespace. Okay, I've asked because systemd is a heavy user of this file and fails due to this within a user namespace. Luckily it is possible to remove all the score changes from the .service files. -- Thanks, //richard -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html