On 11/21/22 12:45?PM, Jens Axboe wrote: > On 11/21/22 12:14?PM, Stefan Roesch wrote: >> +/* >> + * io_napi_add() - Add napi id to the busy poll list >> + * @file: file pointer for socket >> + * @ctx: io-uring context >> + * >> + * Add the napi id of the socket to the napi busy poll list. >> + */ >> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx) >> +{ >> + unsigned int napi_id; >> + struct socket *sock; >> + struct sock *sk; >> + struct io_napi_entry *ne; >> + >> + if (!io_napi_busy_loop_on(ctx)) >> + return; >> + >> + sock = sock_from_file(file); >> + if (!sock) >> + return; >> + >> + sk = sock->sk; >> + if (!sk) >> + return; >> + >> + napi_id = READ_ONCE(sk->sk_napi_id); >> + >> + /* Non-NAPI IDs can be rejected */ >> + if (napi_id < MIN_NAPI_ID) >> + return; >> + >> + spin_lock(&ctx->napi_lock); >> + list_for_each_entry(ne, &ctx->napi_list, list) { >> + if (ne->napi_id == napi_id) { >> + ne->timeout = jiffies + NAPI_TIMEOUT; >> + goto out; >> + } >> + } >> + >> + ne = kmalloc(sizeof(*ne), GFP_NOWAIT); >> + if (!ne) >> + goto out; >> + >> + ne->napi_id = napi_id; >> + ne->timeout = jiffies + NAPI_TIMEOUT; >> + list_add_tail(&ne->list, &ctx->napi_list); >> + >> +out: >> + spin_unlock(&ctx->napi_lock); >> +} > > I think this all looks good now, just one minor comment on the above. Is > the expectation here that we'll basically always add to the napi list? > If so, then I think allocating 'ne' outside the spinlock would be a lot > saner, and then just kfree() it for the unlikely case where we find a > duplicate. After thinking about this a bit more, I don't think this is done in the most optimal fashion. If the list is longer than a few entries, this check (or check-alloc-insert) is pretty expensive and it'll add substantial overhead to the poll path for sockets if napi is enabled. I think we should do something ala: 1) When arming poll AND napi has been enabled for the ring, then alloc io_napi_entry upfront and store it in ->async_data. 2) Maintain the state in the io_napi_entry. If we're on the list, that can be checked with just list_empty(), for example. If not on the list, assign timeout and add. 3) Have regular request cleanup free it. This could be combined with an alloc cache, I would not do that for the first iteration though. This would make io_napi_add() cheap - no more list iteration, and no more allocations. And that is arguably the most important part, as that is called everytime the poll is woken up. Particularly for multishot that makes a big difference. It's also designed much better imho, moving the more expensive bits to the setup side. -- Jens Axboe