Re: [RFC] .flush and io_uring_cancel_files

Pavel Begunkov <asml.silence@xxxxxxxxx> · Wed, 27 May 2020 13:13:15 +0300

On 27/05/2020 01:04, Jann Horn wrote:
> On Tue, May 26, 2020 at 8:11 PM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
>> It looks like taking ->uring_lock should work like kind of grace
>> period for struct files_struct and io_uring_flush(), and that would
>> solve the race with "fcheck(ctx->ring_fd) == ctx->ring_file".
>>
>> Can you take a look? If you like it, I'll send a proper patch
>> and a bunch of cleanups on top.
>>
>>
>> diff --git a/fs/io_uring.c b/fs/io_uring.c
>> index a3dbd5f40391..012af200dc72 100644
>> --- a/fs/io_uring.c
>> +++ b/fs/io_uring.c
>> @@ -5557,12 +5557,11 @@ static int io_grab_files(struct io_kiocb *req)
>>          * the fd has changed since we started down this path, and disallow
>>          * this operation if it has.
>>          */
>> -       if (fcheck(ctx->ring_fd) == ctx->ring_file) {
>> -               list_add(&req->inflight_entry, &ctx->inflight_list);
>> -               req->flags |= REQ_F_INFLIGHT;
>> -               req->work.files = current->files;
>> -               ret = 0;
>> -       }
>> +       list_add(&req->inflight_entry, &ctx->inflight_list);
>> +       req->flags |= REQ_F_INFLIGHT;
>> +       req->work.files = current->files;
>> +       ret = 0;
>> +
>>         spin_unlock_irq(&ctx->inflight_lock);
>>         rcu_read_unlock();
>>
>> @@ -7479,6 +7478,10 @@ static int io_uring_release(struct inode *inode, struct
>> file *file)
>>  static void io_uring_cancel_files(struct io_ring_ctx *ctx,
>>                                   struct files_struct *files)
>>  {
>> +       /* wait all submitters that can race for @files */
>> +       mutex_lock(&ctx->uring_lock);
>> +       mutex_unlock(&ctx->uring_lock);
>> +
>>         while (!list_empty_careful(&ctx->inflight_list)) {
>>                 struct io_kiocb *cancel_req = NULL, *req;
>>                 DEFINE_WAIT(wait);
> 
> First off: You're removing a check in io_grab_files() without changing
> the comment that describes the check; and the new comment you're
> adding in io_uring_cancel_files() is IMO too short to be useful.

Obviously, it was stripped down to show the idea, nobody is talking about
commiting it as is. I hoped Jens remembers it well enough to understand.
Let me describe it in more details then:

> 
> I'm trying to figure out how your change is supposed to work, and I
> don't get it. If a submitter is just past fdget() (at which point no
> locks are held), the ->flush() caller can instantly take and drop the
> ->uring_lock, and then later the rest of the submission path will grab
> an unprotected pointer to the files_struct. Am I missing something?

old = tsk->files;
task_lock(tsk);
tsk->files = files;
task_unlock(tsk);
put_files_struct(old); (i.e. ->flush(old))

It's from reset_files_struct(), and I presume the whole idea of
io_uring->flush() is to protect against racing for similarly going away @old
files. I.e. ensuring of not having io_uring requests holding @old files.

The only place, where current->files are accessed and copied by io_uring, is
io_grab_files(), which is called in the submission path. And the whole
submission path is done under @uring_mtx.
For your case, the submitter will take @uring_mtx only after this lock/unlock
happened, so it won't see old files (happens-before by locking mutex).

The idea behind lock/unlock is that
- submitters already locked @uring_mtx (i.e. started submission) before the
lock/unlock, are waited for in the flush. These can potentially access @old.

- submitters, that came after the lock/unlock, won't see @old files.

So, no new request associated with @old can appear after that. All's left is to
deal with already submitted requests, that's done by the rest of
io_uring_cancel_files().

The thing I don't know is why current->files is originally accessed without
protection in io_grab_files(), but presumably rcu_read_lock() there is for that
reason.

Do you see anything suspicious from the description?

-- 
Pavel Begunkov