Re: [PATCH 07/18] io_uring: support for IO polling

Jens Axboe <axboe@xxxxxxxxx> · Tue, 29 Jan 2019 14:33:35 -0700

On 1/29/19 2:10 PM, Jann Horn wrote:
> On Tue, Jan 29, 2019 at 9:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>> On 1/29/19 1:47 PM, Jann Horn wrote:
>>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>> Add support for a polled io_uring context. When a read or write is
>>>> submitted to a polled context, the application must poll for completions
>>>> on the CQ ring through io_uring_enter(2). Polled IO may not generate
>>>> IRQ completions, hence they need to be actively found by the application
>>>> itself.
>>>>
>>>> To use polling, io_uring_setup() must be used with the
>>>> IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match
>>>> polled and non-polled IO on an io_uring.
>>>>
>>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>>> [...]
>>>> @@ -102,6 +102,8 @@ struct io_ring_ctx {
>>>>
>>>>         struct {
>>>>                 spinlock_t              completion_lock;
>>>> +               bool                    poll_multi_file;
>>>> +               struct list_head        poll_list;
>>>
>>> Please add a comment explaining what protects poll_list against
>>> concurrent modification, and ideally also put lockdep asserts in the
>>> functions that access the list to allow the kernel to sanity-check the
>>> locking at runtime.
>>
>> Not sure that's needed, and it would be a bit difficult with the SQPOLL
>> thread and non-thread being different cases.
>>
>> But comments I can definitely add.
>>
>>> As far as I understand:
>>> Elements are added by io_iopoll_req_issued(). io_iopoll_req_issued()
>>> can't race with itself because, depending on IORING_SETUP_SQPOLL,
>>> either you have to come through sys_io_uring_enter() (which takes the
>>> uring_lock), or you have to come from the single-threaded
>>> io_sq_thread().
>>> io_do_iopoll() iterates over the list and removes completed items.
>>> io_do_iopoll() is called through io_iopoll_getevents(), which can be
>>> invoked in two ways during normal operation:
>>>  - sys_io_uring_enter -> __io_uring_enter -> io_iopoll_check
>>> ->io_iopoll_getevents; this is only protected by the uring_lock
>>>  - io_sq_thread -> io_iopoll_check ->io_iopoll_getevents; this doesn't
>>> hold any locks
>>> Additionally, the following exit paths:
>>>  - io_sq_thread -> io_iopoll_reap_events -> io_iopoll_getevents
>>>  - io_uring_release -> io_ring_ctx_wait_and_kill ->
>>> io_iopoll_reap_events -> io_iopoll_getevents
>>>  - io_uring_release -> io_ring_ctx_wait_and_kill -> io_ring_ctx_free
>>> -> io_iopoll_reap_events -> io_iopoll_getevents
>>
>> Yes, your understanding is correct. But of important note, those two
>> cases don't co-exist. If you are using SQPOLL, then only the thread
>> itself is the one that modifies the list. The only valid call of
>> io_uring_enter(2) is to wakeup the thread, the task itself will NOT be
>> doing any issues. If you are NOT using SQPOLL, then any access is inside
>> the ->uring_lock.
>>
>> For the reap cases, we don't enter those at shutdown for SQPOLL, we
>> expect the thread to do it. Hence we wait for the thread to exit before
>> we do our final release.
>>
>>> So as far as I can tell, you can have various races around access to
>>> the poll_list.
>>
>> How did you make that leap?
> 
> Ah, you're right, I missed a check when going through
> __io_uring_enter(), never mind.

OK good, thanks for confirming, was afraid I was starting to lose my
mind.

-- 
Jens Axboe