And a slightly dubious addition to bypass these checks for tmpfiles across the board. I don't like putting the details of this in the path lookup code, but it is definitely nicer here than looking up the fd twice, once here and again in do_linkat These changes not strictly necessary, because the existing patch allows unprivileged tmpfile flink as long as the creds don't change, but I do think that my reasoning is sound that linking a tmpfile should always be ok whether or not creds have changed --- fs/namei.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/namei.c b/fs/namei.c index 2bcc10976ba7..5c9fabc39481 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2421,6 +2421,7 @@ static const char *path_init(struct nameidata *nd, unsigned flags) if (flags & LOOKUP_DFD_MATCH_CREDS) { if (f.file->f_cred != current_cred() && + !(f.file->f_flags & __O_TMPFILE) && !ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)) { fdput(f); return ERR_PTR(-ENOENT); On Thu, Apr 11, 2024 at 1:29 PM Charles Mirabile <cmirabil@xxxxxxxxxx> wrote: > > Here is an updated version of linus's patch that makes the check > namespace relative > --- > fs/namei.c | 17 ++++++++++++----- > include/linux/namei.h | 1 + > 2 files changed, 13 insertions(+), 5 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 4e0de939fea1..2bcc10976ba7 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -2419,6 +2419,14 @@ static const char *path_init(struct nameidata > *nd, unsigned flags) > if (!f.file) > return ERR_PTR(-EBADF); > > + if (flags & LOOKUP_DFD_MATCH_CREDS) { > + if (f.file->f_cred != current_cred() && > + !ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)) { > + fdput(f); > + return ERR_PTR(-ENOENT); > + } > + } > + > dentry = f.file->f_path.dentry; > > if (*s && unlikely(!d_can_lookup(dentry))) { > @@ -4640,14 +4648,13 @@ int do_linkat(int olddfd, struct filename > *old, int newdfd, > goto out_putnames; > } > /* > - * To use null names we require CAP_DAC_READ_SEARCH > + * To use null names we require CAP_DAC_READ_SEARCH or > + * that the open-time creds of the dfd matches current. > * This ensures that not everyone will be able to create > * handlink using the passed filedescriptor. > */ > - if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) { > - error = -ENOENT; > - goto out_putnames; > - } > + if (flags & AT_EMPTY_PATH) > + how |= LOOKUP_DFD_MATCH_CREDS; > > if (flags & AT_SYMLINK_FOLLOW) > how |= LOOKUP_FOLLOW; > diff --git a/include/linux/namei.h b/include/linux/namei.h > index 74e0cc14ebf8..678ffe4acf99 100644 > --- a/include/linux/namei.h > +++ b/include/linux/namei.h > @@ -44,6 +44,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT}; > #define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */ > #define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */ > #define LOOKUP_CACHED 0x200000 /* Only do cached lookup */ > +#define LOOKUP_DFD_MATCH_CREDS 0x400000 /* Require that dfd creds > match current */ > /* LOOKUP_* flags which do scope-related checks based on the dirfd. */ > #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT) > > -- > 2.44.0 > > On Thu, Apr 11, 2024 at 12:44 PM Charles Mirabile <cmirabil@xxxxxxxxxx> wrote: > > > > On Thu, Apr 11, 2024 at 12:15 PM Linus Torvalds > > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > > > > > I had a similar discussion a while back someone requested that we relax > > > > permissions so linkat can be used in containers. > > > > > > Hmm. > > > > > > Ok, that's different - it just wants root to be able to do it, but > > > "root" being just in the container itself. > > > > > > I don't think that's all that useful - I think one of the issues with > > > linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root", > > > which means that it's effectively useless. Inside a container or out. > > > > > > Because very few loads run as root-only (and fewer still run with any > > > capability bits that aren't just "root or nothing"). > > > > > > Before I did all this, I did a Debian code search for linkat with > > > AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly > > > because of this "when it's only useful for root, it's hardly useful at > > > all" issue. > > > > > > (Of course, my Debian code search may have been broken). > > > > > > So I suspect your special case is actually largely useless, and what > > > the container user actually wanted was what my patch does, but they > > > didn't think that was possible, so they asked to just extend the > > > "root" notion. > > > > > Yes, that is absolutely the case. When Christian poked holes in my > > initial submission, I started working on a v2 but haven't had a chance > > to send it because I wanted to make sure my arguments etc were > > airtight because I am well aware of just how long and storied the > > history of flink is. In the v2 that I didn't post yet, I extended the > > ability to link *any* file from only true root to also "root" within a > > container following the potentially suspect approach that christian > > suggested (I see where you are coming from with the unobvious approach > > to avoiding toctou though I obviously support this patch being > > merged), but I added a followup patch that checks for whether the file > > in question is an `__O_TMPFILE` file which is trivial once we are > > jumping through the hoops of looking up the struct file. If it is a > > tmpfile (i.e. linkcount = 0) I think that all the concerns raised > > about processes that inherit a fd being able to create links to the > > file inappropriately are moot. Here is an excerpt from the cover > > letter I was planning to send that explains my reasoning. > > > > - the ability to create an unnamed file, adjust its contents > > and attributes (i.e. uid, gid, mode, xattrs, etc) and then only give it a name > > once they are ready is exactly the ideal use-case for a hypothetical `flink` > > system call. The ability to use `AT_EMPTY_PATH` with `linkat` essentially turns > > it into `flink`, but these restrictions hamper it from actually being used, as > > most code is not privileged. By checking whether the file to be linked is a > > temporary (i.e. unnamed) file we can allow this useful case without allowing > > the more risky cases that could have security implications. > > > > - Although it might appear that the security posture is affected, it is not. > > Callers who open an extant file on disk in read only mode for sharing with > > potentially untrustworthy code can trust that this change will not affect them > > because it only applies to temporary files. Callers who open a temporary file > > need to do so with write access if they want to actually put any data in it, > > so they will already have to reopen the file (e.g. by linking it to a path > > and opening that path, or opening the magic symlink in proc) if they want to > > share it in read-only mode with untrustworthy code. As such, that new file > > description will no longer be marked as a temporary file and thus these changes > > do not apply. The final case I can think of is the unlikely possibility of > > intentionally sharing read write access to data stored in a temporary file with > > untrustworthy code, but where that code being able to make a link would still > > be deleterious. In such a bizarre case, the `O_EXCL` can and should be used to > > fully prevent the temporary file from ever having a name, and this change does > > not alter its efficacy. > > > > > I've added Charles to the Cc. > > > > > > But yes, with my patch, it would now be trivial to make that > > > > > > capable(CAP_DAC_READ_SEARCH) > > > > > > test also be > > > > > > ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH) > > > > > > instead. I suspect not very many would care any more, but it does seem > > > conceptually sensible. > > > > > > As to your patch - I don't like your nd->root games in that patch at > > > all. That looks odd. > > > > > > Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue), > > > but it also makes lookup ignore "/". Which happens to be ok with an > > > empty path, but still... > > > > > > So it feels to me like that patch of yours mis-uses something that is > > > just meant for vfs_path_lookup(). > > > > > > It may happen to work, but it smells really odd to me. > > > > > > Linus > > >