Markus Gutschke <markus@xxxxxxxxxxxx> writes: > On Tue, Apr 10, 2012 at 15:15, Andrew Lutomirski <luto@xxxxxxx> wrote: >> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >> > With no mappings you can not create a new user namespace or change or >> > uid or gids, and suid exec fails (or possibly ignores the uid/gid change >> > but I am starting with suid exec fails). Making user namespaces similar >> > to no_new_privs. >> >> Hmm. Is this safe? For example, if there's a program that LSM policy >> grants extra privileges that malfunctions when run inside a user >> namespace, can that be used to break out of LSM restrictions? > > Is creation without a mapping similar to some of the other CLONE_XXX > flags that essentially give you a new anonymous and ephemeral > namespace? Close. The primary purpose is to make it simpler to setup the mapping. Strictly speaking you are still running with your original user id, which just doesn't happen to map in a way that is useful for getuid(), or stat, but still works for the permission checks. > Or does it just give you a 1:1 mapping to the parent's namespace? There is no mapping setup. In the parent user namespace you see an unchanged uid. In the new user namespace you see your uid as overflowuid aka 65534 aka nobody. > The former would conceivably be useful for sandboxing purposes. Every > so often, it is desirable to run a process as a user id that is > distinct from any other user id in the system. But this usually > requires the explicit creation of a new entry in /etc/passwd; and of > course it also takes a privileged user to switch to this new user id. > So, unprivileged processes can usually not switch to a dedicated user > id. I could see the benefit in being able to create an ephemeral > anonymous user id. You don't get an ephemeral/anonymous user id, but you do get sandboxed into the user namespace. Which ultimately will make other namespaces usable. > Of course, if the kernel provided for anonymous user ids, this would > have interesting semantics throughout the system. E.g. what happens if > the process attempts to create a new file in /tmp. Would that be > allowed? If so, who would be the owner of the file. Presumably, file > systems don't have any way to represent the fact the user id is > emphemeral. So, an application should be denied file system accesses > unless they obtained a file descriptor that was opened outside of the > namespace. Which is why I skipped that. The entire purpose of this patchset is to make it so that you always, always, always have a uid that maps into the initial user namespace. So as to avoid creating strange cases to consider in the permission checks or the rest of the logic throughout the kernel. > What happens if credentials are passed with SCM_CREDENTIALS? Do they > get translated? Does this work in both directions (i.e. passing in and > out of the namespace)? > > What happens to permissions on files in /proc? > > Can the creator of a namespace send signals to processes in the > namespace? How about the reverse? The logic is the uid in the initial user namespace is translated into whatever user namespaces you are in. If the uid maps you get the mapped uid. If the uid does not map (the initial state) you get overflowuid. > But maybe, this is just too complicated and anonymous ephemeral user > is are not really doable. Which is why the user namespace needs this course correction I am putting it on. Too much pain for too little gain in dealing with uids that don't map in a useful way for the permission checks. Adding the one extra constraint that uids always map to the initial user namespace makes the code fast and simple, at very little cost in flexibility. Eric -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html