On Thu, May 25, 2023 at 08:00:02PM +0900, Dominique Martinet wrote: > Christian Brauner wrote on Thu, May 25, 2023 at 11:22:08AM +0200: > > > What was confusing is that default_llseek updates f_pos under the > > > inode_lock (write), and getdents also takes that lock (for read only in > > > shared implem), so I assumed getdents also was just protected by this > > > read lock, but I guess that was a bad assumption (as I kept pointing > > > out, a shared read lock isn't good enough, we definitely agree there) > > > > > > > > > In practice, in the non-registered file case io_uring is also calling > > > fdget, so the lock is held exactly the same as the syscall and I wasn't > > > > No, it really isn't. fdget() doesn't take f_pos_lock at all: > > > > fdget() > > -> __fdget() > > -> __fget_light() > > -> __fget() > > -> __fget_files() > > -> __fget_files_rcu() > > Ugh, I managed to not notice that I was looking at fdget_pos and that > it's not the same as fdget by the time I wrote two paragraphs... These > functions all have too many wrappers and too similar names for a quick > look before work. > > > If that were true then any system call that passes an fd and uses > > fdget() would try to acquire a mutex on f_pos_lock. We'd be serializing > > every *at based system call on f_pos_lock whenever we have multiple fds > > referring to the same file trying to operate on it concurrently. > > > > We do have fdget_pos() and fdput_pos() as a special purpose fdget() for > > a select group of system calls that require this synchronization. > > Right, that makes sense, and invalidates everything I said after that > anyway but it's not like looking stupid ever killed anyone. I strongly disagree with the looking stupid part. These callchains are quite unwieldy and it's easy to get confused. Usually if you receive a long mail about the semantics involved - as in the earlier thread - it means there's landmines all over. > > Ok so it would require adding a new wrapper from struct file to struct > fd that'd eventually take the lock and set FDPUT_POS_UNLOCK for... not > fdput_pos but another function for that stopping short of fdput... > Then just call that around both vfs_llseek and vfs_getdents calls; which > is the easy part. > > (Or possibly call mutex_lock directly like Dylan did in [1]...) > [1] https://lore.kernel.org/all/20220222105504.3331010-1-dylany@xxxxxx/T/#m3609dc8057d0bc8e41ceab643e4d630f7b91bde6 We'd need a consistent story whatever it ends up being. > I'll be honest though I'm thankful for your explanations but I think > I'll just do like Stefan and stop trying for now: the only reason I've > started this was because I wanted to play with io_uring for a new toy > project and it felt awkward without a getdents for crawling a tree; and > I'm long past the point where I should have thrown the towel and just > make that a sequential walk. > There's too many "conditional patches" (NOWAIT, end of dir indicator) > that I don't care about and require additional work to rebase > continuously so I'll just leave it up to someone else who does care. > > So to that someone: feel free to continue from these branches (I've > included the fix for kernfs_fop_readdir that Dan Carpenter reported): > https://github.com/martinetd/linux/commits/io_uring_getdents > https://github.com/martinetd/liburing/commits/getdents > > Or just start over, there's not that much code now hopefully the > baseline requirements have gotten a little bit clearer. > > > Sorry for stirring the mess and leaving halfway, if nobody does continue > I might send a v3 when I have more time/energy in a few months, but it > won't be quick. It's fine.