Re: [PATCH v2 1/6] fs: split off vfs_getdents function of getdents64 syscall

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/23 19:00, Dominique Martinet wrote:
Christian Brauner wrote on Thu, May 25, 2023 at 11:22:08AM +0200:
What was confusing is that default_llseek updates f_pos under the
inode_lock (write), and getdents also takes that lock (for read only in
shared implem), so I assumed getdents also was just protected by this
read lock, but I guess that was a bad assumption (as I kept pointing
out, a shared read lock isn't good enough, we definitely agree there)


In practice, in the non-registered file case io_uring is also calling
fdget, so the lock is held exactly the same as the syscall and I wasn't

No, it really isn't. fdget() doesn't take f_pos_lock at all:

fdget()
-> __fdget()
    -> __fget_light()
       -> __fget()
          -> __fget_files()
             -> __fget_files_rcu()

Ugh, I managed to not notice that I was looking at fdget_pos and that
it's not the same as fdget by the time I wrote two paragraphs... These
functions all have too many wrappers and too similar names for a quick
look before work.

If that were true then any system call that passes an fd and uses
fdget() would try to acquire a mutex on f_pos_lock. We'd be serializing
every *at based system call on f_pos_lock whenever we have multiple fds
referring to the same file trying to operate on it concurrently.

We do have fdget_pos() and fdput_pos() as a special purpose fdget() for
a select group of system calls that require this synchronization.

Right, that makes sense, and invalidates everything I said after that
anyway but it's not like looking stupid ever killed anyone.

Ok so it would require adding a new wrapper from struct file to struct
fd that'd eventually take the lock and set FDPUT_POS_UNLOCK for... not
fdput_pos but another function for that stopping short of fdput...
Then just call that around both vfs_llseek and vfs_getdents calls; which
is the easy part.

(Or possibly call mutex_lock directly like Dylan did in [1]...)
[1] https://lore.kernel.org/all/20220222105504.3331010-1-dylany@xxxxxx/T/#m3609dc8057d0bc8e41ceab643e4d630f7b91bde6



I'll be honest though I'm thankful for your explanations but I think
I'll just do like Stefan and stop trying for now: the only reason I've
started this was because I wanted to play with io_uring for a new toy
project and it felt awkward without a getdents for crawling a tree; and
I'm long past the point where I should have thrown the towel and just
make that a sequential walk.
There's too many "conditional patches" (NOWAIT, end of dir indicator)
that I don't care about and require additional work to rebase
continuously so I'll just leave it up to someone else who does care.

So to that someone: feel free to continue from these branches (I've
included the fix for kernfs_fop_readdir that Dan Carpenter reported):
https://github.com/martinetd/linux/commits/io_uring_getdents
https://github.com/martinetd/liburing/commits/getdents

Or just start over, there's not that much code now hopefully the
baseline requirements have gotten a little bit clearer.


Sorry for stirring the mess and leaving halfway, if nobody does continue
I might send a v3 when I have more time/energy in a few months, but it
won't be quick.


Hi Dominique,

I'd like to take this if you don't mind.

Regards,
Hao




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux