Re: [PATCH 1/3] io_uring: add support for async work inheriting files table

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/17/19 8:41 PM, Jann Horn wrote:
> On Fri, Oct 18, 2019 at 4:01 AM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> This is in preparation for adding opcodes that need to modify files
>> in a process file table, either adding new ones or closing old ones.
> 
> Closing old ones would be tricky. Basically if you call
> get_files_struct() while you're between an fdget()/fdput() pair (e.g.
> from sys_io_uring_enter()), you're not allowed to use that
> files_struct reference to replace or close existing FDs through that
> reference. (Or more accurately, if you go through fdget() with
> files_struct refcount 1, you must not replace/close FDs in there in
> any way until you've passed the corresponding fdput().)
> 
> You can avoid that if you ensure that you never use fdget()/fdput() in
> the relevant places, only fget()/fput().

That's a good point, I didn't think the closing aspect through when
writing that changelog. File addition is the most interesting aspect,
obviously, and the only part that I care about in this patch set. I'll
change the wording.

>> If an opcode needs this, it must set REQ_F_NEED_FILES in the request
>> structure. If work that needs to get punted to async context have this
>> set, they will grab a reference to the process file table. When the
>> work is completed, the reference is dropped again.
> [...]
>> @@ -2220,6 +2223,10 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>                                  set_fs(USER_DS);
>>                          }
>>                  }
>> +               if (s->files && !old_files) {
>> +                       old_files = current->files;
>> +                       current->files = s->files;
>> +               }
> 
> AFAIK e.g. stuff like proc_fd_link() in procfs can concurrently call
> get_files_struct() even on kernel tasks, so you should take the
> task_lock(current) while fiddling with the ->files pointer.

Fixed up, thanks!

> Also, maybe I'm too tired to read this correctly, but it seems like
> when io_sq_wq_submit_work() is processing multiple elements with
> ->files pointers, this part will only consume a reference to the first
> one?

Like the mm, we should only have the one file table. But there's no
reason to not handle this properly, I've amended the commit to properly
swap so it works for any number of file tables.

>>                  if (!ret) {
>>                          s->has_user = cur_mm != NULL;
>> @@ -2312,6 +2319,11 @@ static void io_sq_wq_submit_work(struct work_struct *work)
>>                  unuse_mm(cur_mm);
>>                  mmput(cur_mm);
>>          }
>> +       if (old_files) {
>> +               struct files_struct *files = current->files;
>> +               current->files = old_files;
>> +               put_files_struct(files);
>> +       }
> 
> And then here the first files_struct reference is dropped, and the
> rest of them leak?

Fixed with the above change.

>> @@ -2413,6 +2425,8 @@ static int __io_queue_sqe(struct io_ring_ctx *ctx, struct io_kiocb *req,
>>
>>                          s->sqe = sqe_copy;
>>                          memcpy(&req->submit, s, sizeof(*s));
>> +                       if (req->flags & REQ_F_NEED_FILES)
>> +                               req->submit.files = get_files_struct(current);
> 
> Stupid question: How does this interact with sqpoll mode? In that
> case, this function is running on a kernel thread that isn't sharing
> the application's files_struct, right?

Not a stupid question! It doesn't work with sqpoll. We need to be
entered on behalf of the task, and we never see that with sqpoll (except
if NEED_WAKE is set in flags).

For now I'll just forbid it explicitly in io_accept(), just like we do
for IORING_SETUP_IOPOLL.

Updated patch1:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=df6caac708dae8ee9a74c9016e479b02ad78d436

and patch 3:

http://git.kernel.dk/cgit/linux-block/commit/?h=for-5.5/io_uring-test&id=442bb35fc4f8f28c29ea220475c45babb44ee49c

-- 
Jens Axboe




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux