Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > [Added Eric Biederman, since I think your tree might be a reasonable > route forward for these patches.] > > On Thu, Jun 5, 2014 at 6:40 AM, David Drysdale <drysdale@xxxxxxxxxx> wrote: >> Resending, adding cc:linux-api. >> >> Also, it may help to add a little more background -- this patch is >> needed as a (small) part of implementing Capsicum in the Linux kernel. >> >> Capsicum is a security framework that has been present in FreeBSD since >> version 9.0 (Jan 2012), and is based on concepts from object-capability >> security [1]. >> >> One of the features of Capsicum is capability mode, which locks down >> access to global namespaces such as the filesystem hierarchy. In >> capability mode, /proc is thus inaccessible and so fexecve(3) doesn't >> work -- hence the need for a kernel-space > > I just found myself wanting this syscall for another reason: injecting > programs into sandboxes or otherwise heavily locked-down namespaces. > > For example, I want to be able to reliably do something like nsenter > --namespace-flags-here toybox sh. Toybox's shell is unusual in that > it is more or less fully functional, so this should Just Work (tm), > except that the toybox binary might not exist in the namespace being > entered. If execveat were available, I could rig nsenter or a similar > tool to open it with O_CLOEXEC, enter the namespace, and then call > execveat. > > Is there any reason that these patches can't be merged more or less as > is for 3.19? Yes. There is a silliness in how it implements fexecve. The fexecve case should be use the empty string "" not a NULL pointer to indication that. That change will then harmonize execveat with the other ...at system calls and simplify the code and remove a special case. I believe using the empty string "" requires implementing the AT_EMPTY_PATH flag. For sandboxes execveat seems to make a great deal of sense. I can get the same functionality by passing in a directory file descriptor calling fchdir and execve so this should not introduce any new security holes. And using the final file descriptor removes a race. AT_SYMLINK_NOFOLLOW seems to have some limited utility as well, although for exec I don't know what problems it can solve. Until I am done moving I won't have time to pick this up, and the code clearly needs another revision but I will be happy to work to see that we get a sane execveat implemented. Eric p.s. I don't believe there are any namespaces issues where doing something with execveat flags make sense. -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html