Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > Andrew Lutomirski <luto@xxxxxxx> writes: > > > On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman > > <ebiederm@xxxxxxxxxxxx> wrote: > >> Andrew Lutomirski <luto@xxxxxxx> writes: > >> > >>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman > >>> <ebiederm@xxxxxxxxxxxx> wrote: > >>>> Andy Lutomirski <luto@xxxxxxx> writes: > >>>> > >>>> My understanding of no_new_privs is that current_cred() including > >>>> the user, the user namespace and the security label will never change, > >>>> with the goal of making the security analysis simple. > >>> > >>> They can change but only if you already have the privilege to change > >>> them yourself and then you do so. For example, PR_SET_NO_NEW_PRIVS, > >>> setuid, then drop caps is allowed and useful -- it's a race-free way > >>> to make sure that a given uid never executes without no_new_privs set. > >>> I've implemented this as a pam module. > >> > >> Careful. There is the security_task_fix_setuid call that will raise > >> your capabilities from cap->effective to cap->permitted if you call > >> setuid(0). Which in the general case means you can regain all of the > >> root privileges if you only have CAP_SETUID. > >> > > > > That's fine. If you're running with CAP_SETUID and default > > securebits, then you effectively have all capabilities already and > > don't need to exploit setuid binaries to gain them. no_new_privs > > doesn't change that. If you don't want to be able to gain all privs, > > change securebits or drop CAP_SETUID. seccomp reduces the kernel > > attack surface; no_new_privs reduces the userspace attack surface. > > But see below... > > > > > >> > >>>> I don't recall how seccomp filters are dealt with if you don't have > >>>> no_new_privs enabled. If seccomp filters installed by root > >>>> are dropped when we change privilege levels it might be worth looking > >>>> at how to keep a seccomp filter installed as long as you stay in > >>>> a user namespace. > >>>> > >>> > >>> They're not dropped. I think in the current implementation they can't > >>> be dropped at all. > >> > >> Which makes sense. Is this why you need no_new_privs? So you can't run > >> seccomp on higher privileged executables and confusing them into keeping > >> privileges when they should not? > > > > Exactly. seccomp is flexible enough that it's probably possible to > > confuse many setuid executables with it. > > > >> > >>>> The emphasis is a bit different from new_new_privs as the user_namespace > >>>> does not need to guarantee that the lsm will not change security labels, > >>>> etc. > >>> > >>> Hmm. Is this safe? For example, if there's a program that LSM policy > >>> grants extra privileges that malfunctions when run inside a user > >>> namespace, can that be used to break out of LSM restrictions? > >> > >> I can't see how it would not be safe. > >> > >> Except for the user namespace pointer the state the LSM and the rest of > >> the kernel sees is the same state the kernel sees. Aka userspace sees > >> uid 0, the LSM does not. So I don't know why a LSM would get confused. > >> > >> Beyond that it is a bug for an LSM to grant permissions beyond the > >> core DAC model. So the worst I can see is an LSM not grokking user > >> namespaces and getting confused and not restricting a process as > >> much as the designer of the LSM would like. > > > > Right. Suppose you have some program that has extra restrictions > > applied by an LSM. It executes a helper (e.g. Apache's suidexec > > thing, but I bet there are more examples) which is supposed to be very > > careful not to leak privileges. The LSM is set to restrict that > > helper less than the parent process. But that program was written > > before user namespaces existed, and it has a bug (or missing feature) > > that allows its parent to exploit it when run inside an unmapped user > > namespace. The parent can now escape from the LSM restrictions. > > > > no_new_privs is designed to prevent exactly this issue. > > Currently the suid exec will fail because the uid's don't map. > > I might switch that around to simply ignoring the change of uid > on suid exec. I have a patch in my devel tree that plays with > that idea. However as much as I hit that case once in testing > (I think it was ping). I don't think running suid executables > is particularly interesting. > > Certainly the application program won't care or break, because we are > still bounded by the usaual DAC security. > > I wonder a little if the lsm might change labels on exec of a > non suid binary. That case is more interesting in the unmapped > unprivileged user namespace. They will (change labels on exec of non suid binary). But. First, any well behaved user of user namespaces will switch to a (selinux, smack, apparmor, whatever) context which is aware it is namespaced so that only desired transitions happen. So we're left with the concern of uid 1001 creates an unprivileged user namespace and runs a program (as uid 0) which transitions him to uber_client_t. Since as Eric has pointed out the MAC can't override the DAC rules, it still won't be able to write to files not owned by uid 1001 in the initial user namespace. We might worry about it connecting to the privileged server and passing its uber_client_t credentials to pass a request. The server being in the initial user ns will get uid 1001, not 0. Perhaps the client checks its uid (0 in its user namespace) and passes it to the server (as a simple message), which blindly accepts that. In that case the server could just as easily be exploited without user namespaces. It's possible that there's another way this can be exploited, but I haven't thought of it yet. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html