On Tue, Jan 29, 2019 at 8:27 PM Jens Axboe <axboe@xxxxxxxxx> wrote: > Add support for a polled io_uring context. When a read or write is > submitted to a polled context, the application must poll for completions > on the CQ ring through io_uring_enter(2). Polled IO may not generate > IRQ completions, hence they need to be actively found by the application > itself. > > To use polling, io_uring_setup() must be used with the > IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match > polled and non-polled IO on an io_uring. > > Signed-off-by: Jens Axboe <axboe@xxxxxxxxx> [...] > @@ -102,6 +102,8 @@ struct io_ring_ctx { > > struct { > spinlock_t completion_lock; > + bool poll_multi_file; > + struct list_head poll_list; Please add a comment explaining what protects poll_list against concurrent modification, and ideally also put lockdep asserts in the functions that access the list to allow the kernel to sanity-check the locking at runtime. As far as I understand: Elements are added by io_iopoll_req_issued(). io_iopoll_req_issued() can't race with itself because, depending on IORING_SETUP_SQPOLL, either you have to come through sys_io_uring_enter() (which takes the uring_lock), or you have to come from the single-threaded io_sq_thread(). io_do_iopoll() iterates over the list and removes completed items. io_do_iopoll() is called through io_iopoll_getevents(), which can be invoked in two ways during normal operation: - sys_io_uring_enter -> __io_uring_enter -> io_iopoll_check ->io_iopoll_getevents; this is only protected by the uring_lock - io_sq_thread -> io_iopoll_check ->io_iopoll_getevents; this doesn't hold any locks Additionally, the following exit paths: - io_sq_thread -> io_iopoll_reap_events -> io_iopoll_getevents - io_uring_release -> io_ring_ctx_wait_and_kill -> io_iopoll_reap_events -> io_iopoll_getevents - io_uring_release -> io_ring_ctx_wait_and_kill -> io_ring_ctx_free -> io_iopoll_reap_events -> io_iopoll_getevents So as far as I can tell, you can have various races around access to the poll_list. [...] > +static void io_free_req_many(struct io_ring_ctx *ctx, void **reqs, int *nr) > +{ > + if (*nr) { > + kmem_cache_free_bulk(req_cachep, *nr, reqs); > + io_ring_drop_ctx_refs(ctx, *nr); > + *nr = 0; > + } > +} [...] > +static void io_iopoll_complete(struct io_ring_ctx *ctx, unsigned int *nr_events, > + struct list_head *done) > +{ > + void *reqs[IO_IOPOLL_BATCH]; > + struct io_kiocb *req; > + int to_free = 0; > + > + while (!list_empty(done)) { > + req = list_first_entry(done, struct io_kiocb, list); > + list_del(&req->list); > + > + io_cqring_fill_event(ctx, req->user_data, req->error, 0); > + > + reqs[to_free++] = req; > + (*nr_events)++; > + > + fput(req->rw.ki_filp); > + if (to_free == ARRAY_SIZE(reqs)) > + io_free_req_many(ctx, reqs, &to_free); > + } > + io_commit_cqring(ctx); > + > + if (to_free) > + io_free_req_many(ctx, reqs, &to_free); Nit: You check here whether to_free==0, and then io_free_req_many() does that again. You can delete one of those checks; I'd probably delete this one. > +} [...]