Re: [PATCH] file: always lock position

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 24 Jul 2023 at 11:05, Jens Axboe <axboe@xxxxxxxxx> wrote:
>
> io_uring never does that isn't the original user space creator task, or
> from the io-wq workers that it may create. Those are _always_ normal
> threads. There's no workqueue/kthread usage for IO or file
> getting/putting/installing/removing etc.

That's what I thought. But Christian's comments about the io_uring
getdents work made me worry.

If io_uring does everything right, then the old "file_count(file) > 1"
test in __fdget_pos() - now sadly removed - should work just fine for
any io_uring directory handling.

It may not be obvious when you look at just that test in a vacuum, but
it happens after __fdget_pos() has done the

        unsigned long v = __fdget(fd);
        struct file *file = (struct file *)(v & ~3);

and if it's a threaded app - where io_uring threads count the same -
then the __fdget() in that sequence will have incremented the file
count.

So when it then used to do that

                if (file_count(file) > 1) {

that "> 1" included not just all the references from dup() etc,  but
for any threaded use where we have multiple references to the file
table it also that reference we get from __fdget() -> __fget() ->
__fget_files() -> __fget_files_rcu() -> get_file_rcu() ->
atomic_long_inc_not_zero(&(x)->f_count)).

And yes, that's a fairly long chain, through some subtle code. Doing
"fdget()" (and variations) is not trivial.

Of course, with directory handling, one of the things people have
wanted is to have a "pread()" like model, where you have a position
that is given externally and not tied to the 'struct file'.

We don't have such a thing, and I don't think we can have one, because
I think filesystems also end up having possibly other private data in
the 'struct file' (trivially: the directory cursor for
dcache_dir_open()).

And other filesystems have it per-inode, which is why we have that
"iterate_shared" vs "iterate" horror.

So any dentry thing that wants to do things in parallel needs to open
their own private 'file' just for *that* reason. And even then they
will fail if the filesystem doesn't have that shared directory entry
iterator.

           Linus



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux