On Mon, Aug 19, 2019 at 8:37 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
The most obvious syscall to add support for the new LOOKUP_* scoping flags would be openat(2). However, there are a few reasons why this is not the best course of action: * The new LOOKUP_* flags are intended to be security features, and openat(2) will silently ignore all unknown flags. This means that users would need to avoid foot-gunning themselves constantly when using this interface if it were part of openat(2). This can be fixed by having userspace libraries handle this for users[1], but should be avoided if possible. * Resolution scoping feels like a different operation to the existing O_* flags. And since openat(2) has limited flag space, it seems to be quite wasteful to clutter it with 5 flags that are all resolution-related. Arguably O_NOFOLLOW is also a resolution flag but its entire purpose is to error out if you encounter a trailing symlink -- not to scope resolution. * Other systems would be able to reimplement this syscall allowing for cross-OS standardisation rather than being hidden amongst O_* flags which may result in it not being used by all the parties that might want to use it (file servers, web servers, container runtimes, etc). * It gives us the opportunity to iterate on the O_PATH interface. In particular, the new @how->upgrade_mask field for fd re-opening is only possible because we have a clean slate without needing to re-use the ACC_MODE flag design nor the existing openat(2) @mode semantics. To this end, we introduce the openat2(2) syscall. It provides all of the features of openat(2) through the @how->flags argument, but also also provides a new @how->resolve argument which exposes RESOLVE_* flags that map to our new LOOKUP_* flags. It also eliminates the long-standing ugliness of variadic-open(2) by embedding it in a struct. In order to allow for userspace to lock down their usage of file descriptor re-opening, openat2(2) has the ability for users to disallow certain re-opening modes through @how->upgrade_mask. At the moment, there is no UPGRADE_NOEXEC. The open_how struct is padded to 64 bytes for future extensions (all of the reserved bits must be zeroed).
Why pad the structure when new functionality (perhaps accommodated via a larger structure) could be signaled by passing a new flag? Adding reserved fields to a structure with a size embedded in the ABI makes a lot of sense --- e.g., pthread_mutex_t can't grow. But this structure can grow, so the reservation seems needless to me.