Re: [RFC PATCH] fs: introduce mkdirat2 syscall for atomic mkdir

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 01.03.21 um 20:02 schrieb J. Bruce Fields:
> On Sun, Feb 28, 2021 at 02:58:22AM +0000, Al Viro wrote:
>> TBH, I don't understand what are you trying to achieve -
>> what will that mkdir+open combination buy you, especially
>> since that atomicity goes straight out of window if you try
>> to use that on e.g. NFS.  How is the userland supposed to make
>> use of that thing?
> 
> For what it's worth, the RPC that creates a directory can also get a
> filehandle of the new directory, so I don't think there's anything in
> the NFS protocol that would *prevent* implementing this.  (Whether
> that's useful, I don't know.)

The same applies to SMB, there's only a single SMB2/3 Create call,
which is able to create/open files or directories and returns
an open file handle for it.

With an atomic mkdir+open it would be possible have a single round trip
between client and server. It would help on the client, but also for Samba
as a server, as we would be able to skip additional syscalls.

And it would be great to have a way to specify flags similar to O_CREAT
and O_EXCL in order to create a new directory or open an existing one.

It should also be possible to pass in RESOLVE_* flags, so a similar call like openat2()
would be great.

For me openat2() with O_CREAT | O_EXCL | O_DIRECTORY, would be the natural thing to
support, because it's natural in the SMB protocol. But Al seems to hate that and I'm fine
with his arguments against that.

Plus we can't use that anyway as it's currently not rejected with EINVAL,
instead a regular file is created on disk, but -1 ENOTDIR
is returned to userspace.

Currently userspace needs to do something like this in order to be safe for
a given untrusted directory path string (userdirpath) being to be opened
(and created if it doesn't exist):

1. make a copy of userdirpath and call dirname() => dirnameresult
2. make a copy of userdirpath and call basename() => basenameresult
3. call dirfd = openat2(basedirfd, dirnameresult, how = {.flags = O_PATH | O_CLOEXEC, .resolve = RESOLVE_BENEATH});
4. call mkdirat(dirfd, basenameresult, 0755)
5. call close(dirfd)
6. ignore possible EEXIST from mkdirat
7. call fd = openat2(basedirfd, userdirpath, how = { .flags = O_DIRECTORY | O_CLOEXEC, .resolve = RESOLVE_BENEATH});

This requires memory allocations and 4 syscall round trips.

It would be wonderful to just have a single syscall for this.
I'm not sure about the exact details of the API or a possible name
for such a syscall (mkdirat2 seems wrong), but it could look like this:

struct somenewdirsyscall_how {
	__u64 flags; / only O_CLOEXEC, O_CREAT, O_EXCL */
        __u64 mode;
        __u64 resolve;
};

Instead of reusing O_* flags, new defines could also be used.

fd = somenewdirsyscall(basedirfd, userdirpath, how = { .flags = O_CLOEXEC | O_CREAT, .mask = 0755, .resolve = RESOLVE_BENEATH});

What would be a good way forward here?

metze



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux