Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes: > On Tue, May 12, 2020 at 11:46 AM Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> >> I am still thinking about this one, but here is where I am at. At a >> practical level passing the file descriptor of the script to interpreter >> seems like something we should encourage in the long term. It removes >> races and it is cheaper because then the interpreter does not have to >> turn around and open the script itself. > > Yeah, I think we should continue to support it, because I think it's > the right thing to do (and we might just end up having compatibility > issues if we don't). > > How about trying to move the logic to the common code, out of binfmt_misc? > > IOW, how about something very similar to your "brpm->preserve_creds" > thing that you did for the credentials (also for binfmt_misc, which > shouldn't surprise anybody: binfmt_misc is simply the "this is the > generic thing for letting user mode do the final details"). > >> Calling fd_install in binfmt_misc still seems wrong, as that exposes >> the new file descriptor to user space with the old creds. > > Right. And it really would be good to simply not have these kinds of > very special cases inside the low-level binfmt code: I'd much rather > have the special cases in the generic code, so that we see what the > ordering is etc. One of the big problems with all these binfmt > callbacks has been the fact that it makes it so hard to think about > and change the generic code, because the low-level binfmt handlers all > do their own special thing. > > So moving it to generic code would likely simplify things from that > angle, even if the actual complexity of the feature itself remains. > > Besides, we really have exposed this to other code anyway thanks to > that whole bprm->interp_data thing, and the AT_EXECFD AUX entries that > we have. So it's not really "internal" to binfmt_misc _anyway_. > > So how about we just move the fd_binary logic to the generic execve > code, and just binfmt_misc set the flag for "yes, please do this", > exactly like "preserve_creds"? > >> It is possible although unlikely for userspace to find the file >> descriptor without consulting AT_EXECFD so just to be conservative I >> think we should install the file descriptor in begin_new_exec even if >> the next interpreter does not support AT_EXECFD. > > Ack. I think the AT_EXECFD thing is a sign that this isn't internal to > binfmt_misc, but it also shouldn't be gating this issue. In reality, > ELF is the only real binary format that matters - the script/misc > binfmts are just indirection entries - and it supports AT_EXECFD, so > let's just ignore the theoretical case of "maybe nobody exposes it". > > So yes, just make it part of begin_new_exec(), and there's no reason > to support more than a single fd. No stacks or arrays of these things > required, I feel. It's not like AT_EXECFD supports the notion of > multiple fd's being reported anyway, nor does it make any sense to > have some kind of nested misc->misc binfmt nesting. > > So making that whole interp_data and fd_binary thing be a generic > layer thing would make the search_binary_handler() code in binfmt_misc > be a pure tailcall too, and then the conversion to a loop ends up > working and being the right thing. That is pretty much what I have been thinking. I have just been taking it slow so I find as many funny corner cases as I can. Nothing ever clears the BINPRM_FLAGS_EXECFD so the current code can not support nesting. Now I do think a nested misc->misc binfmt thing can make sense in principal. I have an old dos spectrum emulator that I use to play some of the games that I grew up with. Running that emulator makes me two emulators deep. I can also imagine writting a domain specific language in python or perl, and setting things up so scripts in the domain specific language can be run directly. So I think I need to deliberately test and prevent a nested misc->misc, just so data structures don't get stomped. If the cases where it could useful prove sufficiently interesting we can enable them later. Eric