On Mon, Aug 19, 2019 at 8:37 PM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote: > > The most obvious syscall to add support for the new LOOKUP_* scoping > flags would be openat(2). However, there are a few reasons why this is > not the best course of action: > > * The new LOOKUP_* flags are intended to be security features, and > openat(2) will silently ignore all unknown flags. This means that > users would need to avoid foot-gunning themselves constantly when > using this interface if it were part of openat(2). This can be fixed > by having userspace libraries handle this for users[1], but should be > avoided if possible. > > * Resolution scoping feels like a different operation to the existing > O_* flags. And since openat(2) has limited flag space, it seems to be > quite wasteful to clutter it with 5 flags that are all > resolution-related. Arguably O_NOFOLLOW is also a resolution flag but > its entire purpose is to error out if you encounter a trailing > symlink -- not to scope resolution. > > * Other systems would be able to reimplement this syscall allowing for > cross-OS standardisation rather than being hidden amongst O_* flags > which may result in it not being used by all the parties that might > want to use it (file servers, web servers, container runtimes, etc). > > * It gives us the opportunity to iterate on the O_PATH interface. In > particular, the new @how->upgrade_mask field for fd re-opening is > only possible because we have a clean slate without needing to re-use > the ACC_MODE flag design nor the existing openat(2) @mode semantics. > > To this end, we introduce the openat2(2) syscall. It provides all of the > features of openat(2) through the @how->flags argument, but also > also provides a new @how->resolve argument which exposes RESOLVE_* flags > that map to our new LOOKUP_* flags. It also eliminates the long-standing > ugliness of variadic-open(2) by embedding it in a struct. > > In order to allow for userspace to lock down their usage of file > descriptor re-opening, openat2(2) has the ability for users to disallow > certain re-opening modes through @how->upgrade_mask. At the moment, > there is no UPGRADE_NOEXEC. The open_how struct is padded to 64 bytes > for future extensions (all of the reserved bits must be zeroed). Why pad the structure when new functionality (perhaps accommodated via a larger structure) could be signaled by passing a new flag? Adding reserved fields to a structure with a size embedded in the ABI makes a lot of sense --- e.g., pthread_mutex_t can't grow. But this structure can grow, so the reservation seems needless to me. _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers