On Thu, Dec 13, 2018 at 06:36:15PM +0100, Mickaël Salaün wrote: > On 13/12/2018 18:13, Matthew Wilcox wrote: > > On Thu, Dec 13, 2018 at 04:17:29PM +0100, Mickaël Salaün wrote: > >> Adding a new syscall for this simple use case seems excessive. I think > > > > We have somewhat less than 400 syscalls today. We have 20 O_ bits defined. > > Obviously there's a lower practical limit on syscalls, but in principle > > we could have up to 2^32 syscalls, and there are only 12 O_ bits remaining. > > > >> that the open/openat syscall familly are the right place to do an atomic > >> open and permission check, the same way the kernel does for other file > >> access. Moreover, it will be easier to patch upstream interpreters > >> without the burden of handling a (new) syscall that may not exist on the > >> running system, whereas unknown open flags are ignored. > > > > Ah, but that's the problem. The interpreter can see an -ENOSYS response > > and handle it appropriately. If the flag is silently ignored, the > > interpreter has no idea whether it can do a racy check or whether to > > skip even trying to do the check. > > Right, but the interpreter should interpret the script if the open with > O_MAYEXEC succeed (but not otherwise): it may be because the flag is > known by the kernel and the system policy allow this call, or because > the (old) kernel doesn't known about this flag (which is fine and needed > for backward compatibility). The script interpretation must not failed > if the kernel doesn't support O_MAYEXEC, it is then useless for the > interpreter to do any additional check. If that's the way interpreters want to work, then that's fine. They can just call the verify() syscall and ignore the -ENOSYS. Done. Or somebody who cares very, very deeply can change the interpreter to decline to run any scripts if the kernel returns -ENOSYS.