On Thu, 3 Aug 2023 at 11:03, Christian Brauner <brauner@xxxxxxxxxx> wrote: > > Only thing that's missing is exclusion with seek on directories > as that's the heinous part. Bah. I forgot about lseek entirely, because for some completely stupid reason I just thought "Oh, that will always get the lock". So I guess we'd just have to do that "unconditional fdget_dir()" thing in the header file after all, and make llseek() and ksys_lseek() use it. Bah. And then we'd still have to worry about any filesystem that allows 'read()' and 'write()' on the directory - which can also update f_pos. And yes, those exist. See at least 'adfs_dir_ops', and 'ceph_dir_fops'. They may be broken, but people actually did do things like that historically, maybe there's a reason adfs and ceph allow it. End result: we can forget about fdget_dir(). We'd need to split FMODE_ATOMIC_POS into two instead. I don't think we have any free flags, but I didn't check. The ugly thing to do is to just special-case S_ISDIR. Not lovely, but whatever. So something like this instead? It's a smaller diff anyway, and it gets the crazy afds/ceph cases right too. And by "gets them right" I obviously mean "I didn't actually *TEST* any of this, so who knows..." Linus
fs/file.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/fs/file.c b/fs/file.c index 35c62b54c9d6..dbca26ef7a01 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1036,12 +1036,28 @@ unsigned long __fdget_raw(unsigned int fd) return __fget_light(fd, 0); } +/* + * Try to avoid f_pos locking. We only need it if the + * file is marked for FMODE_ATOMIC_POS, and it can be + * accessed multiple ways. + * + * Always do it for directories, because pidfd_getfd() + * can make a file accessible even if it otherwise would + * not be, and for directories this is a correctness + * issue, not a "POSIX requirement". + */ +static inline bool file_needs_f_pos_lock(struct file *file) +{ + return (file->f_mode & FMODE_ATOMIC_POS) && + (file_count(file) > 1 || S_ISDIR(file_inode(file)->i_mode)); +} + unsigned long __fdget_pos(unsigned int fd) { unsigned long v = __fdget(fd); struct file *file = (struct file *)(v & ~3); - if (file && (file->f_mode & FMODE_ATOMIC_POS)) { + if (file && file_needs_f_pos_lock(file)) { v |= FDPUT_POS_UNLOCK; mutex_lock(&file->f_pos_lock); }