Re: [RFC] do_iopoll() and *grab_env()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 12/06/2020 21:02, Jens Axboe wrote:
> On 6/12/20 11:55 AM, Jens Axboe wrote:
>> On 6/12/20 11:30 AM, Pavel Begunkov wrote:
>>> On 12/06/2020 20:02, Jens Axboe wrote:
>>>> On 6/11/20 9:54 AM, Pavel Begunkov wrote:
>>>>> io_do_iopoll() can async punt a request with io_queue_async_work(),
>>>>> so doing io_req_work_grab_env(). The problem is that iopoll() can
>>>>> be called from who knows what context, e.g. from a completely
>>>>> different process with its own memory space, creds, etc.
>>>>>
>>>>> io_do_iopoll() {
>>>>> 	ret = req->poll();
>>>>> 	if (ret == -EAGAIN)
>>>>> 		io_queue_async_work()
>>>>> 	...
>>>>> }
>>>>>
>>>>>
>>>>> I can't find it handled in io_uring. Can this even happen?
>>>>> Wouldn't it be better to complete them with -EAGAIN?
>>>>
>>>> I don't think a plain -EAGAIN complete would be very useful, it's kind
>>>> of a shitty thing to pass back to userspace when it can be avoided. For
>>>> polled IO, we know we're doing O_DIRECT, or using fixed buffers. For the
>>>> latter, there's no problem in retrying, regardless of context. For the
>>>> former, I think we'd get -EFAULT mapping the IO at that point, which is
>>>> probably reasonable. I'd need to double check, though.
>>>
>>> It's shitty, but -EFAULT is the best outcome. I care more about not
>>> corrupting another process' memory if addresses coincide. AFAIK it can
>>> happen because io_{read,write} will use iovecs for punted re-submission.
>>>
>>>
>>> Unconditional in advance async_prep() is too heavy to be good. I'd love to
>>> see something more clever, but with -EAGAIN users at least can handle it.
>>
>> So how about we just grab ->task for the initial issue, and retry if we
>> find it through -EAGAIN and ->task == current. That'll be the most
>> common case, by far, and it'll prevent passes back -EAGAIN when we
>> really don't have to. If the task is different, then -EAGAIN makes more
>> sense, because at that point we're passing back -EAGAIN because we
>> really cannot feasibly handle it rather than just as a convenience.

Yeah, I was even thinking to drag it through task_work just to call
*grab_env() there. Looks reasonable to me.

> Something like this, totally untested. And wants a comment too.

Looks like it. Would you leave this to me? There is another issue with
cancellation requiring ->task, It'd be easier to keep them together.

> 
> diff --git a/fs/io_uring.c b/fs/io_uring.c
> index 155f3d830ddb..15806f71b33e 100644
> --- a/fs/io_uring.c
> +++ b/fs/io_uring.c
> @@ -1727,6 +1728,12 @@ static int io_put_kbuf(struct io_kiocb *req)
>  	return cflags;
>  }
>  
> +static inline void req_set_fail_links(struct io_kiocb *req)
> +{
> +	if ((req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) == REQ_F_LINK)
> +		req->flags |= REQ_F_FAIL_LINK;
> +}
> +
>  /*
>   * Find and free completed poll iocbs
>   */
> @@ -1767,8 +1774,14 @@ static void io_iopoll_queue(struct list_head *again)
>  	do {
>  		req = list_first_entry(again, struct io_kiocb, list);
>  		list_del(&req->list);
> -		refcount_inc(&req->refs);
> -		io_queue_async_work(req);
> +		if (req->task == current) {
> +			refcount_inc(&req->refs);
> +			io_queue_async_work(req);
> +		} else {
> +			io_cqring_add_event(req, -EAGAIN);
> +			req_set_fail_links(req);
> +			io_put_req(req);
> +		}
>  	} while (!list_empty(again));
>  }
>  
> @@ -1937,12 +1950,6 @@ static void kiocb_end_write(struct io_kiocb *req)
>  	file_end_write(req->file);
>  }
>  
> -static inline void req_set_fail_links(struct io_kiocb *req)
> -{
> -	if ((req->flags & (REQ_F_LINK | REQ_F_HARDLINK)) == REQ_F_LINK)
> -		req->flags |= REQ_F_FAIL_LINK;
> -}
> -
>  static void io_complete_rw_common(struct kiocb *kiocb, long res)
>  {
>  	struct io_kiocb *req = container_of(kiocb, struct io_kiocb, rw.kiocb);
> @@ -2137,6 +2144,8 @@ static int io_prep_rw(struct io_kiocb *req, const struct io_uring_sqe *sqe,
>  
>  		kiocb->ki_flags |= IOCB_HIPRI;
>  		kiocb->ki_complete = io_complete_rw_iopoll;
> +		req->task = current;
> +		get_task_struct(current);
>  		req->result = 0;
>  		req->iopoll_completed = 0;
>  	} else {
> 

-- 
Pavel Begunkov



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux