> On May 11, 2019, at 4:08 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > [ on mobile again, power is off and the wifi doesn't work, so I'm reading email on my phone and apologizing once more for horrible html email.. ] > >> On Sat, May 11, 2019, 18:40 Andy Lutomirski <luto@xxxxxxxxxx> wrote: >> >> a) Change all my UIDs and GIDs to match a container, enter that >> container's namespaces, and run some binary in the container's >> filesystem, all atomically enough that I don't need to worry about >> accidentally leaking privileges into the container. A >> super-duper-non-dumpable mode would kind of allow this, but I'd worry >> that there's some other hole besides ptrace() and /proc/self. > > > So I would expect that you'd want to do this even *without* doing an execve at all, which is why I still don't think this is actually about spawn/execve at all. > > But I agree that what you that what you want sounds reasonable. But I think the "dumpable" flag has always been a complete hack, and very unreliable with random meanings (and random ways to then get around it). > > We have actually had lots of people wanting to run "lists of system calls" kinds of things. Sometimes for performance reasons, sometimes for other random reasons Maybe that "atomicity" angle would be another one, although we would have to make the setuid etc things be something that happens at the end. > > So rather than "spawn", is much rather see people think about ways to just batch operations in general, rather than think it is about batching things just before a process change. > >> b) Change all my UIDs and GIDs to match a container, enter that >> container's namespaces, and run some binary that is *not* in the >> container's filesystem. > > > Hey, you could probably do something very close to that by opening the executable you want to run first, making it O_CLOEXEC, then doing all the other things (which now makes the executable inaccessible), and finally doing execveat() on the file descriptor.. > > I say "something very close" because even though it's O_CLOEXEC, only the file would be closed, and /proc/self/exe would still exist. > > But we really could make that execveat of a O_CLOEXEC file thing also disable access to /proc/*/exe, and it sounds like a sane and simple extension in general to do.. > > I bet this will break something that already exists. An execveat() flag to turn off /proc/self/exe would do the trick, though. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers