I spent half of dinner last night being complained to by one of our hardware engineers about Linux's lack of support for the flags argument to fchmodat(). This all came about because of a FUSE filesystem implementation, and while there are some application-specific workarounds for the issue it seemed to me like the cleanest bet was to just go add another fchmodat() that supports flags to the kernel. The actual implementation is super simple: essentially it's just the same as fchmodat(), but LOOKUP_FOLLOW is conditionally set based on the flags. I've attempted to make this match "man 2 fchmodat" as closely as possible, which says EINVAL is returned for invalid flags (as opposed to ENOTSUPP, which is currently returned by glibc for AT_SYMLINK_NOFOLLOW). I have a sketch of a glibc patch that I haven't even compiled yet, but seems fairly straight-forward: diff --git a/sysdeps/unix/sysv/linux/fchmodat.c b/sysdeps/unix/sysv/linux/fchmodat.c index 6d9cbc1ce9e0..b1beab76d56c 100644 --- a/sysdeps/unix/sysv/linux/fchmodat.c +++ b/sysdeps/unix/sysv/linux/fchmodat.c @@ -29,12 +29,36 @@ int fchmodat (int fd, const char *file, mode_t mode, int flag) { - if (flag & ~AT_SYMLINK_NOFOLLOW) - return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL); -#ifndef __NR_lchmod /* Linux so far has no lchmod syscall. */ + /* There are four paths through this code: + - The flags are zero. In this case it's fine to call fchmodat. + - The flags are non-zero and glibc doesn't have access to + __NR_fchmodat4. In this case all we can do is emulate the error codes + defined by the glibc interface from userspace. + - The flags are non-zero, glibc has __NR_fchmodat4, and the kernel has + fchmodat4. This is the simplest case, as the fchmodat4 syscall exactly + matches glibc's library interface so it can be called directly. + - The flags are non-zero, glibc has __NR_fchmodat4, but the kernel does + not. In this case we must respect the error codes defined by the glibc + interface instead of returning ENOSYS. + The intent here is to ensure that the kernel is called at most once per + library call, and that the error types defined by glibc are always + respected. */ + +#ifdef __NR_fchmodat4 + long result; +#endif + + if (flag == 0) + return INLINE_SYSCALL (fchmodat, 3, fd, file, mode); + +#ifdef __NR_fchmodat4 + result = INLINE_SYSCALL (fchmodat4, 4, fd, file, mode, flag); + if (result == 0 || errno != ENOSYS) + return result; +#endif + if (flag & AT_SYMLINK_NOFOLLOW) return INLINE_SYSCALL_ERROR_RETURN_VALUE (ENOTSUP); -#endif - return INLINE_SYSCALL (fchmodat, 3, fd, file, mode); + return INLINE_SYSCALL_ERROR_RETURN_VALUE (EINVAL); } I've never added a new syscall before so I'm not really sure what the proper procedure to follow is. I'm assuming any new syscall will involve fairly significant discussion, so I've just done the minimum of an implementation for this patch set. Specifically, I've: * Defined a new syscall that looks like fchmodat but includes a flag argument, which I'm calling fchmodat4 because it has 4 arguments. I don't know if that's the correct naming convention, and don't really have any skin in that game. * Implemented that syscall by extending the fchmod code to handle flags, which is pretty straight-forward. I think it's sane, but given that it's so simple I'm not sure if I'm missing something -- specifically, I didn't go check to make sure the semantics of AT_SYMLINK_NOFOLLOW match !LOOKUP_FOLLOW. I'm assuming the do, but sometimes when I look at something and say "that's so simple, how is it broken" I'm actually just missing something entirely. * Added an asm-generic syscall number for this, which I assume I'm supposed to do this first as it looks like we're trying to keep the numbers in sync everywhere. * Added x86 syscalls for this so I could test it. I also cleaned up a checkpatch issue in fchmodat(). I only found this because I copied the fchmodat() interface for fchmodat4() and it threw the warning, I don't personally care either way as to whether or not the space is in there. I've given this fairly minimal testing. Essentially all I've done is booted up 5.1.6 with this patch set on my local development box and run $ touch test-file $ ln -s test-file test-link $ cat > test.c #include <fcntl.h> #include <stdio.h> #include <unistd.h> int main(int argc, char **argv) { long out; out = syscall(428, AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW); printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out); out = syscall(428, AT_FDCWD, "test-file", 0x888, 0); printf("fchmodat4(AT_FDCWD, \"test-file\", 0x888, 0): %ld\n", out); out = syscall(268, AT_FDCWD, "test-file", 0x888); printf("fchmodat(AT_FDCWD, \"test-file\", 0x888): %ld\n", out); out = syscall(428, AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW); printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, AT_SYMLINK_NOFOLLOW): %ld\n", out); out = syscall(428, AT_FDCWD, "test-link", 0x888, 0); printf("fchmodat4(AT_FDCWD, \"test-link\", 0x888, 0): %ld\n", out); out = syscall(268, AT_FDCWD, "test-link", 0x888); printf("fchmodat(AT_FDCWD, \"test-link\", 0x888): %ld\n", out); return 0; } $ gcc test.c -o test $ ./test fchmodat4(AT_FDCWD, "test-file", 0x888, AT_SYMLINK_NOFOLLOW): 0 fchmodat4(AT_FDCWD, "test-file", 0x888, 0): 0 fchmodat(AT_FDCWD, "test-file", 0x888): 0 fchmodat4(AT_FDCWD, "test-link", 0x888, AT_SYMLINK_NOFOLLOW): -1 fchmodat4(AT_FDCWD, "test-link", 0x888, 0): 0 fchmodat(AT_FDCWD, "test-link", 0x888): 0 While I don't think there's any reason what's there is unacceptable, I don't really consider this finished. I couldn't find a cookbook for "here's how you add a system call", but all I really did was "git grep add | grep syscall" so if there's something out there then please let me know and I'll follow it. Specifically, I haven't: * Added any sort of documentation. I don't find anything with a "git grep fchmodat", so I'm assuming it's just the man pages that are relevant here. * Fixed any of the other architectures. I'm assuming this is just the mechanical process of fixing all these in the same way I did for x86. arch/alpha/kernel/syscalls/syscall.tbl:461 common fchmodat sys_fchmodat arch/arm/tools/syscall.tbl:333 common fchmodat sys_fchmodat arch/arm64/include/asm/unistd32.h:#define __NR_fchmodat 333 arch/arm64/include/asm/unistd32.h:__SYSCALL(__NR_fchmodat, sys_fchmodat) arch/ia64/kernel/fsys.S: data8 0 // fchmodat arch/ia64/kernel/syscalls/syscall.tbl:268 common fchmodat sys_fchmodat arch/m68k/kernel/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat arch/microblaze/kernel/syscalls/syscall.tbl:306 common fchmodat sys_fchmodat arch/mips/kernel/syscalls/syscall_n32.tbl:262 n32 fchmodat sys_fchmodat arch/mips/kernel/syscalls/syscall_n64.tbl:258 n64 fchmodat sys_fchmodat arch/mips/kernel/syscalls/syscall_o32.tbl:299 o32 fchmodat sys_fchmodat arch/parisc/kernel/syscalls/syscall.tbl:286 common fchmodat sys_fchmodat arch/powerpc/kernel/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat arch/s390/kernel/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat sys_fchmodat arch/sh/include/uapi/asm/unistd_64.h:#define __NR_fchmodat 334 arch/sh/kernel/syscalls/syscall.tbl:306 common fchmodat sys_fchmodat arch/sh/kernel/syscalls_64.S: .long sys_fchmodat arch/sparc/kernel/syscalls/syscall.tbl:295 common fchmodat sys_fchmodat arch/xtensa/kernel/syscalls/syscall.tbl:300 common fchmodat sys_fchmodat * Looked at anything in tools. Again, I'm assuming it's just a mechanical process of looking at all of these and adding fchmodat4. tools/include/nolibc/nolibc.h:#ifdef __NR_fchmodat tools/include/nolibc/nolibc.h: return my_syscall4(__NR_fchmodat, AT_FDCWD, path, mode, 0); tools/include/uapi/asm-generic/unistd.h:#define __NR_fchmodat 53 tools/include/uapi/asm-generic/unistd.h:__SYSCALL(__NR_fchmodat, sys_fchmodat) tools/perf/arch/powerpc/entry/syscalls/syscall.tbl:297 common fchmodat sys_fchmodat tools/perf/arch/s390/entry/syscalls/syscall.tbl:299 common fchmodat sys_fchmodat compat_sys_fchmodat tools/perf/arch/x86/entry/syscalls/syscall_64.tbl:268 common fchmodat __x64_sys_fchmodat tools/perf/builtin-trace.c: { .name = "fchmodat", * Done anything with userspace, aside from thinking about the glibc code above. I'd assume that I'm meant to bring in libc-alpha to the discussion, but I didn't want to do so this early in case this was just a non-starter. I'm happy dealing with all of that, but given that I'm assuming there's going to be some discussion I wanted to send out the proof-of-concept first to see if this has any legs. Aside from the glibc side the remaining work smells pretty mechanical, so I figured I'd wait on that until I knew it wasn't going to be a waste of time -- partially because I'm lazy, but mostly because I just realized I blew my whole morning working on this when all I really wanted to do was avoid discussing fchmodat in the first place :)