On Thu, Apr 11, 2024 at 12:15 PM Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > > > I had a similar discussion a while back someone requested that we relax > > permissions so linkat can be used in containers. > > Hmm. > > Ok, that's different - it just wants root to be able to do it, but > "root" being just in the container itself. > > I don't think that's all that useful - I think one of the issues with > linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root", > which means that it's effectively useless. Inside a container or out. > > Because very few loads run as root-only (and fewer still run with any > capability bits that aren't just "root or nothing"). > > Before I did all this, I did a Debian code search for linkat with > AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly > because of this "when it's only useful for root, it's hardly useful at > all" issue. > > (Of course, my Debian code search may have been broken). > > So I suspect your special case is actually largely useless, and what > the container user actually wanted was what my patch does, but they > didn't think that was possible, so they asked to just extend the > "root" notion. > Yes, that is absolutely the case. When Christian poked holes in my initial submission, I started working on a v2 but haven't had a chance to send it because I wanted to make sure my arguments etc were airtight because I am well aware of just how long and storied the history of flink is. In the v2 that I didn't post yet, I extended the ability to link *any* file from only true root to also "root" within a container following the potentially suspect approach that christian suggested (I see where you are coming from with the unobvious approach to avoiding toctou though I obviously support this patch being merged), but I added a followup patch that checks for whether the file in question is an `__O_TMPFILE` file which is trivial once we are jumping through the hoops of looking up the struct file. If it is a tmpfile (i.e. linkcount = 0) I think that all the concerns raised about processes that inherit a fd being able to create links to the file inappropriately are moot. Here is an excerpt from the cover letter I was planning to send that explains my reasoning. - the ability to create an unnamed file, adjust its contents and attributes (i.e. uid, gid, mode, xattrs, etc) and then only give it a name once they are ready is exactly the ideal use-case for a hypothetical `flink` system call. The ability to use `AT_EMPTY_PATH` with `linkat` essentially turns it into `flink`, but these restrictions hamper it from actually being used, as most code is not privileged. By checking whether the file to be linked is a temporary (i.e. unnamed) file we can allow this useful case without allowing the more risky cases that could have security implications. - Although it might appear that the security posture is affected, it is not. Callers who open an extant file on disk in read only mode for sharing with potentially untrustworthy code can trust that this change will not affect them because it only applies to temporary files. Callers who open a temporary file need to do so with write access if they want to actually put any data in it, so they will already have to reopen the file (e.g. by linking it to a path and opening that path, or opening the magic symlink in proc) if they want to share it in read-only mode with untrustworthy code. As such, that new file description will no longer be marked as a temporary file and thus these changes do not apply. The final case I can think of is the unlikely possibility of intentionally sharing read write access to data stored in a temporary file with untrustworthy code, but where that code being able to make a link would still be deleterious. In such a bizarre case, the `O_EXCL` can and should be used to fully prevent the temporary file from ever having a name, and this change does not alter its efficacy. > I've added Charles to the Cc. > > But yes, with my patch, it would now be trivial to make that > > capable(CAP_DAC_READ_SEARCH) > > test also be > > ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH) > > instead. I suspect not very many would care any more, but it does seem > conceptually sensible. > > As to your patch - I don't like your nd->root games in that patch at > all. That looks odd. > > Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue), > but it also makes lookup ignore "/". Which happens to be ok with an > empty path, but still... > > So it feels to me like that patch of yours mis-uses something that is > just meant for vfs_path_lookup(). > > It may happen to work, but it smells really odd to me. > > Linus >