> > > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and > > > foo(/proc/self/fd/<fd>) should always be identical, and the current > > > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always > > > assume that keeping them makes sense (the most obvious example is being > > > able to do tricks to open /proc/$pid/exe as O_RDWR). > > > > Please make a note that I have applications relying on current magic symlink > > semantics w.r.t setxattr() and other metadata operations, and the libselinux > > commit linked from the patch commit message proves that magic symlink > > semantics are used in the wild, so it is not likely that those semantics could > > be changed, unless userspace breakage could be justified by fixing a serious > > security issue (i.e. open /proc/$pid/exe as O_RDWR). > > Agreed. We also use magiclinks for similar TOCTOU-protection purposes in > runc (as does lxc) as well as in libpathrs so I'm aware we need to be > careful about changing existing behaviours. I would prefer to have the > default be as restrictive as possible, but naturally back-compat is > more important. > > > > I suspect that the long-term solution would be to have more upgrade > > > masks so that userspace can opt-in to not allowing any kind of > > > (metadata) write access through a particular file descriptor. You're > > > quite right that we have several metadata write AT_EMPTY_PATH APIs, and > > > so we can't retroactively block /everything/ but we should try to come > > > up with less leaky rules by default if it won't break userspace. > > > > Ok, let me try to say this in my own words using an example to see that > > we are all on the same page: > > > > - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races > > - fsetxattr(fd,...) is not applicable for symbolic links > > While I agree with Christian's concerns about making O_PATH descriptors > more leaky, if userspace already relies on this through /proc/self/fd/$x > then there's not much we can do about it other than having an opt-out > available in openat2(2). Having the option to disable this stuff to > avoid making O_PATH descriptors less safe as a mechanism for passing > around "capability-less" file handles should make most people happy > (with the note that ideally we would not be *adding* capabilities to > O_PATH we don't need to). > > > - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races > > when setting xattr on symbolic links > > - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the > > "new API" for setting xattr on symlinks (and special files) > > If this is a usecase we need to support then we may as well just re-use > fsetxattr() since it's basically an *at(2) syscall already (and I don't > see why we'd want to split up the capabilities between two similar > *at(2)-like syscalls). Though this does come with the above caveats that > we need to have the opt-outs available if we're going to enshrine this > as intentional part of the ABI. Christian preferred that new functionality be added with a new API and I agree that this is nicer and more explicit. The bigger question IMO is, whether fsomething() should stay identical to somethingat(,,,AT_EMPTY_PATH). I don't think that it should. To me, open(path,O_PATH)+somethingat(,,,AT_EMPTY_PATH) is identical to something(path) - it just breaks the path resolution and operation to two distinguished steps. fsomething() was traditionally used for "really" open fds, so if we don't need to, we better not relax it further by allowing O_PATH, but that's just one opinion. > > > - The new API is going to be more strict than the old magic symlink API > > - *If* it turns out to not break user applications, old API can also become > > more strict to align with new API (unlikely the case for setxattr()) > > - This will allow sandboxed containers to opt-out of the "old API", by > > restricting access to /proc/self/fd and to implement more fine grained > > control over which metadata operations are allowed on an O_PATH fd > > > > Did I understand the plan correctly? > > Yup, except I don't think we need setxattrat(2). > > > Do you agree with me that the plan to keep AT_EMPTY_PATH and magic > > symlink semantics may not be realistic? > > To clarify -- my view is that if any current /proc/self/fd/$n semantic > needs to be maintained then I would prefer that the proc-less method of > doing it (such as through AT_EMPTY_PATH et al) would have the same > capability and semantics. There are some cases where the current > /proc/self/fd/$n semantics need to be fixed (such as the /proc/$pid/exe > example) and in that case the proc-less semantics also need to be made > safe. > > While I would like us to restrict O_PATH as much as possible, if > userspace already depends on certain behaviour then we may not be able > to do much about it. Having an opt-out would be very important since > enshrining these leaky behaviours (which seem to have been overlooked) > means we need to consider how userspace can opt out of them. > > Unfortunately, it should be noted that due to the "magical" nature of > nd_jump_link(), I'm not sure how happy Al Viro will be with the kinds of > restrictions necessary. Even my current (quite limited) upgrade-mask > patchset has to do a fair bit of work to unify the semantics of > magic-links and openat(O_EMPTYPATH) -- expanding this to all *at(2) > syscalls might be quite painful. (There are also several handfuls of > semantic questions which need to be answered about magic-link modes and > whether for other *at(2) operations we may need even more complicated > rules or even a re-thinking of my current approach.) > The question remains, regarding the $SUBJECT patch, is it fair to block it and deprive libselinux of a non buggy API until such time that all the details around masking O_PATH fds will be made clear and the new API implemented? There is no guarantee this will ever happen, so it does not seem reasonable to me. To be a reasonable reaction to the currently broken API is to either accept the patch as is or request that setxattrat() will be added to provide the new functionality. Thanks, Amir.