On Thu, 2024-07-25 at 10:38 +0200, Christian Brauner wrote: > On Wed, Jul 24, 2024 at 09:56:09AM GMT, Jeff Layton wrote: > > On Wed, 2024-07-24 at 15:15 +0200, Christian Brauner wrote: > > > Systemd has a helper called openat_report_new() that returns whether a > > > file was created anew or it already existed before for cases where > > > O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't > > > something that's specific to systemd but it's where I noticed it. > > > > > > The current logic is that it first attempts to open the file without > > > O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both > > > flags. If that succeeds all is well. If it now reports EEXIST it > > > retries. > > > > > > That works fairly well but some corner cases make this more involved. If > > > this operates on a dangling symlink the first openat() without O_CREAT | > > > O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL > > > will fail with EEXIST. The reason is that openat() without O_CREAT | > > > O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security > > > reasons. So it's not something we can really change unless we add an > > > explicit opt-in via O_FOLLOW which seems really ugly. > > > > > > The caller could try and use fanotify() to register to listen for > > > creation events in the directory before calling openat(). The caller > > > could then compare the returned tid to its own tid to ensure that even > > > in threaded environments it actually created the file. That might work > > > but is a lot of work for something that should be fairly simple and I'm > > > uncertain about it's reliability. > > > > > > The caller could use a bpf lsm hook to hook into security_file_open() to > > > figure out whether they created the file. That also seems a bit wild. > > > > > > So let's add F_CREATED which allows the caller to check whether they > > > actually did create the file. That has caveats of course but I don't > > > think they are problematic: > > > > > > * In multi-threaded environments a thread can only be sure that it did > > > create the file if it calls openat() with O_CREAT. In other words, > > > it's obviously not enough to just go through it's fdtable and check > > > these fds because another thread could've created the file. > > > > > > > Not exactly. FMODE_CREATED is set in the file description. In principle > > a userland program should know which thread actually did the the open() > > that results in each fd. This new interface tells us which fd's open > > actually resulted in the file being created (which is good). > > > > In any case, I don't see this as a problem. The interface does what it > > says on the tin. > > > > > * If there's any codepaths where an openat() with O_CREAT would yield > > > the same struct file as that of another thread it would obviously > > > cause wrong results. I'm not aware of any such codepaths from openat() > > > itself. Imho, that would be a bug. > > > > > > > Definitely a bug. That said, this will have interesting interactions > > with dup that may need to be documented. IOW, if you dup a file with > > FMODE_CREATED, then the new fd will also report that F_CREATED is true. > > Yes, but it's the behavior one expects. Dup'ing an fd implies fd1->file > == fd2->file and so it's correct behavior. IOW, someone must confuse > open() with dup() to be surprised by this. It's the behavior _we_ generally expect, but a lot of people don't understand the subtleties of file descriptions vs. file descriptors. Spelling out this behavior explicitly would probably prevent a few headaches. -- Jeff Layton <jlayton@xxxxxxxxxx>