Re: [PATCH RFC 1/2] fcntl: add F_CREATED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 24-07-24 15:15:35, Christian Brauner wrote:
> Systemd has a helper called openat_report_new() that returns whether a
> file was created anew or it already existed before for cases where
> O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't
> something that's specific to systemd but it's where I noticed it.
> 
> The current logic is that it first attempts to open the file without
> O_CREAT | O_EXCL and if it gets ENOENT the helper tries again with both
> flags. If that succeeds all is well. If it now reports EEXIST it
> retries.
> 
> That works fairly well but some corner cases make this more involved. If
> this operates on a dangling symlink the first openat() without O_CREAT |
> O_EXCL will return ENOENT but the second openat() with O_CREAT | O_EXCL
> will fail with EEXIST. The reason is that openat() without O_CREAT |
> O_EXCL follows the symlink while O_CREAT | O_EXCL doesn't for security
> reasons. So it's not something we can really change unless we add an
> explicit opt-in via O_FOLLOW which seems really ugly.
> 
> The caller could try and use fanotify() to register to listen for
> creation events in the directory before calling openat(). The caller
> could then compare the returned tid to its own tid to ensure that even
> in threaded environments it actually created the file. That might work
> but is a lot of work for something that should be fairly simple and I'm
> uncertain about it's reliability.
> 
> The caller could use a bpf lsm hook to hook into security_file_open() to
> figure out whether they created the file. That also seems a bit wild.
> 
> So let's add F_CREATED which allows the caller to check whether they
> actually did create the file. That has caveats of course but I don't
> think they are problematic:
> 
> * In multi-threaded environments a thread can only be sure that it did
>   create the file if it calls openat() with O_CREAT. In other words,
>   it's obviously not enough to just go through it's fdtable and check
>   these fds because another thread could've created the file.
> 
> * If there's any codepaths where an openat() with O_CREAT would yield
>   the same struct file as that of another thread it would obviously
>   cause wrong results. I'm not aware of any such codepaths from openat()
>   itself. Imho, that would be a bug.
> 
> * Related to the previous point, calling the new fcntl() on files created
>   and opened via special-purpose system calls or ioctl()s would cause
>   wrong results only if the affected subsystem a) raises FMODE_CREATED
>   and b) may return the same struct file for two different calls. I'm
>   not seeing anything outside of regular VFS code that raises
>   FMODE_CREATED.
> 
>   There is code for b) in e.g., the drm layer where the same struct file
>   is resurfaced but again FMODE_CREATED isn't used and it would be very
>   misleading if it did.
> 
> Link: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 [1]
> Signed-off-by: Christian Brauner <brauner@xxxxxxxxxx>

Looks like a sensible functionality. I agree with Jeff that the behavior
wrt dup(2) / fork(2) needs to be clearly stated but is hardly surprising
given how everything else with file descriptors and descriptions works.
So feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

								Honza

> ---
>  fs/fcntl.c                 | 10 ++++++++++
>  include/uapi/linux/fcntl.h |  3 +++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/fs/fcntl.c b/fs/fcntl.c
> index 300e5d9ad913..55a66ad9b432 100644
> --- a/fs/fcntl.c
> +++ b/fs/fcntl.c
> @@ -343,6 +343,12 @@ static long f_dupfd_query(int fd, struct file *filp)
>  	return f.file == filp;
>  }
>  
> +/* Let the caller figure out whether a given file was just created. */
> +static long f_created(const struct file *filp)
> +{
> +	return !!(filp->f_mode & FMODE_CREATED);
> +}
> +
>  static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>  		struct file *filp)
>  {
> @@ -352,6 +358,9 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>  	long err = -EINVAL;
>  
>  	switch (cmd) {
> +	case F_CREATED:
> +		err = f_created(filp);
> +		break;
>  	case F_DUPFD:
>  		err = f_dupfd(argi, filp, 0);
>  		break;
> @@ -463,6 +472,7 @@ static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
>  static int check_fcntl_cmd(unsigned cmd)
>  {
>  	switch (cmd) {
> +	case F_CREATED:
>  	case F_DUPFD:
>  	case F_DUPFD_CLOEXEC:
>  	case F_DUPFD_QUERY:
> diff --git a/include/uapi/linux/fcntl.h b/include/uapi/linux/fcntl.h
> index c0bcc185fa48..d78a6c237688 100644
> --- a/include/uapi/linux/fcntl.h
> +++ b/include/uapi/linux/fcntl.h
> @@ -16,6 +16,9 @@
>  
>  #define F_DUPFD_QUERY	(F_LINUX_SPECIFIC_BASE + 3)
>  
> +/* Was the file just created? */
> +#define F_CREATED	(F_LINUX_SPECIFIC_BASE + 4)
> +
>  /*
>   * Cancel a blocking posix lock; internal use only until we expose an
>   * asynchronous lock api to userspace:
> 
> -- 
> 2.43.0
> 
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux