On 8/12/24 2:15 PM, Olivier Langlois wrote: > On Mon, 2024-08-12 at 12:11 -0600, Jens Axboe wrote: >> On 8/12/24 12:10 PM, Jens Axboe wrote: >>> On 8/11/24 7:00 PM, Olivier Langlois wrote: >>>> On Sun, 2024-08-11 at 20:34 -0400, Olivier Langlois wrote: >>>>> io_napi_entry() has 2 calling sites. One of them is unlikely to >>>>> find >>>>> an >>>>> entry and if it does, the timeout should arguable not be >>>>> updated. >>>>> >>>>> The other io_napi_entry() calling site is overwriting the >>>>> update made >>>>> by io_napi_entry() so the io_napi_entry() timeout value update >>>>> has no >>>>> or >>>>> little value and therefore is removed. >>>>> >>>>> Signed-off-by: Olivier Langlois <olivier@xxxxxxxxxxxxxx> >>>>> --- >>>>> io_uring/napi.c | 1 - >>>>> 1 file changed, 1 deletion(-) >>>>> >>>>> diff --git a/io_uring/napi.c b/io_uring/napi.c >>>>> index 73c4159e8405..1de1d4d62925 100644 >>>>> --- a/io_uring/napi.c >>>>> +++ b/io_uring/napi.c >>>>> @@ -26,7 +26,6 @@ static struct io_napi_entry >>>>> *io_napi_hash_find(struct hlist_head *hash_list, >>>>> hlist_for_each_entry_rcu(e, hash_list, node) { >>>>> if (e->napi_id != napi_id) >>>>> continue; >>>>> - e->timeout = jiffies + NAPI_TIMEOUT; >>>>> return e; >>>>> } >>>>> >>>> I am commenting my own patch because I found something curious >>>> that I >>>> was not sure about when I was reviewing the code. >>>> >>>> Should the remaining e->timeout assignation be wrapped with a >>>> WRITE_ONCE() macro to ensure an atomic store? >>> >>> I think that makes sense to do as lookup can be within rcu, and >>> hence we have nothing serializing it. Not for torn writes, but to >>> ensure that the memory sanitizer doesn't complain. I can just make >>> this change while applying, or send a v2. >> >> As a separate patch I mean, not a v2. That part can wait until 6.12. >> > ok. np. I'll look into it soon. > > In the meantime, I have detected few suspicious things in the napi > code. > > I am reporting them here to have few extra eye balls looking at them to > be sure that everything is fine or not. > > 1. in __io_napi_remove_stale(), > > is it ok to use hash_for_each() instead of hash_for_each_safe()? > > it might be ok because it is a hash_del_rcu() and not a simple > hash_del() but I have little experience with possible RCU shortcuts so > I am unsure on this one... Should use hash_for_each_rcu(), and I think for good measure, we sould just keep it inside the RCU region. Then we know for a fact that the deletion doesn't run. > 2. in io_napi_free() > > list_del(&e->list); is not called. Can the only reason be that > io_napi_free() is called as part of the ring destruction so it is an > optimization to not clear the list since it is not expected to be > reused? > > would calling INIT_LIST_HEAD() before exiting as an extra precaution to > make the function is future proof in case it is reused in another > context than the ring destruction be a good idea? I think that's just an oversight, and doesn't matter since it's all going away anyway. But it would be prudent to delete it regardless! > 3. I am surprised to notice that in __io_napi_do_busy_loop(), > list_for_each_entry_rcu() is called to traverse the list but the > regular methods list_del() and list_add_tail() are called to update the > list instead of their RCU variant. Should all just use rcu variants. Here's a mashup of the changes. Would be great if you can test - I'll do some too, but always good with more than one person testing as it tends to hit more cases. diff --git a/io_uring/napi.c b/io_uring/napi.c index d0cf694d0172..6251111a7e1f 100644 --- a/io_uring/napi.c +++ b/io_uring/napi.c @@ -81,7 +81,7 @@ void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock) } hlist_add_tail_rcu(&e->node, hash_list); - list_add_tail(&e->list, &ctx->napi_list); + list_add_tail_rcu(&e->list, &ctx->napi_list); spin_unlock(&ctx->napi_lock); } @@ -91,9 +91,9 @@ static void __io_napi_remove_stale(struct io_ring_ctx *ctx) unsigned int i; spin_lock(&ctx->napi_lock); - hash_for_each(ctx->napi_ht, i, e, node) { + hash_for_each_rcu(ctx->napi_ht, i, e, node) { if (time_after(jiffies, e->timeout)) { - list_del(&e->list); + list_del_rcu(&e->list); hash_del_rcu(&e->node); kfree_rcu(e, rcu); } @@ -174,9 +174,8 @@ static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx, do { is_stale = __io_napi_do_busy_loop(ctx, loop_end_arg); } while (!io_napi_busy_loop_should_end(iowq, start_time) && !loop_end_arg); - rcu_read_unlock(); - io_napi_remove_stale(ctx, is_stale); + rcu_read_unlock(); } /* @@ -209,6 +208,7 @@ void io_napi_free(struct io_ring_ctx *ctx) spin_lock(&ctx->napi_lock); hash_for_each(ctx->napi_ht, i, e, node) { hash_del_rcu(&e->node); + list_del_rcu(&e->list); kfree_rcu(e, rcu); } spin_unlock(&ctx->napi_lock); @@ -309,9 +309,8 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx) rcu_read_lock(); is_stale = __io_napi_do_busy_loop(ctx, NULL); - rcu_read_unlock(); - io_napi_remove_stale(ctx, is_stale); + rcu_read_unlock(); return 1; } -- Jens Axboe