On 1/29/19 2:10 PM, Jann Horn wrote: > On Tue, Jan 29, 2019 at 9:56 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >> On 1/29/19 1:47 PM, Jann Horn wrote: >>> On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote: >>>> Add support for a polled io_uring context. When a read or write is >>>> submitted to a polled context, the application must poll for completions >>>> on the CQ ring through io_uring_enter(2). Polled IO may not generate >>>> IRQ completions, hence they need to be actively found by the application >>>> itself. >>>> >>>> To use polling, io_uring_setup() must be used with the >>>> IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match >>>> polled and non-polled IO on an io_uring. >>>> >>>> Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> >>> [...] >>>> @@ -102,6 +102,8 @@ struct io_ring_ctx { >>>> >>>> struct { >>>> spinlock_t completion_lock; >>>> + bool poll_multi_file; >>>> + struct list_head poll_list; >>> >>> Please add a comment explaining what protects poll_list against >>> concurrent modification, and ideally also put lockdep asserts in the >>> functions that access the list to allow the kernel to sanity-check the >>> locking at runtime. >> >> Not sure that's needed, and it would be a bit difficult with the SQPOLL >> thread and non-thread being different cases. >> >> But comments I can definitely add. >> >>> As far as I understand: >>> Elements are added by io_iopoll_req_issued(). io_iopoll_req_issued() >>> can't race with itself because, depending on IORING_SETUP_SQPOLL, >>> either you have to come through sys_io_uring_enter() (which takes the >>> uring_lock), or you have to come from the single-threaded >>> io_sq_thread(). >>> io_do_iopoll() iterates over the list and removes completed items. >>> io_do_iopoll() is called through io_iopoll_getevents(), which can be >>> invoked in two ways during normal operation: >>> - sys_io_uring_enter -> __io_uring_enter -> io_iopoll_check >>> ->io_iopoll_getevents; this is only protected by the uring_lock >>> - io_sq_thread -> io_iopoll_check ->io_iopoll_getevents; this doesn't >>> hold any locks >>> Additionally, the following exit paths: >>> - io_sq_thread -> io_iopoll_reap_events -> io_iopoll_getevents >>> - io_uring_release -> io_ring_ctx_wait_and_kill -> >>> io_iopoll_reap_events -> io_iopoll_getevents >>> - io_uring_release -> io_ring_ctx_wait_and_kill -> io_ring_ctx_free >>> -> io_iopoll_reap_events -> io_iopoll_getevents >> >> Yes, your understanding is correct. But of important note, those two >> cases don't co-exist. If you are using SQPOLL, then only the thread >> itself is the one that modifies the list. The only valid call of >> io_uring_enter(2) is to wakeup the thread, the task itself will NOT be >> doing any issues. If you are NOT using SQPOLL, then any access is inside >> the ->uring_lock. >> >> For the reap cases, we don't enter those at shutdown for SQPOLL, we >> expect the thread to do it. Hence we wait for the thread to exit before >> we do our final release. >> >>> So as far as I can tell, you can have various races around access to >>> the poll_list. >> >> How did you make that leap? > > Ah, you're right, I missed a check when going through > __io_uring_enter(), never mind. OK good, thanks for confirming, was afraid I was starting to lose my mind. -- Jens Axboe