Re: [PATCH v5 1/3] io_uring: add napi busy polling support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/21/22 12:45?PM, Jens Axboe wrote:
> On 11/21/22 12:14?PM, Stefan Roesch wrote:
>> +/*
>> + * io_napi_add() - Add napi id to the busy poll list
>> + * @file: file pointer for socket
>> + * @ctx:  io-uring context
>> + *
>> + * Add the napi id of the socket to the napi busy poll list.
>> + */
>> +void io_napi_add(struct file *file, struct io_ring_ctx *ctx)
>> +{
>> +	unsigned int napi_id;
>> +	struct socket *sock;
>> +	struct sock *sk;
>> +	struct io_napi_entry *ne;
>> +
>> +	if (!io_napi_busy_loop_on(ctx))
>> +		return;
>> +
>> +	sock = sock_from_file(file);
>> +	if (!sock)
>> +		return;
>> +
>> +	sk = sock->sk;
>> +	if (!sk)
>> +		return;
>> +
>> +	napi_id = READ_ONCE(sk->sk_napi_id);
>> +
>> +	/* Non-NAPI IDs can be rejected */
>> +	if (napi_id < MIN_NAPI_ID)
>> +		return;
>> +
>> +	spin_lock(&ctx->napi_lock);
>> +	list_for_each_entry(ne, &ctx->napi_list, list) {
>> +		if (ne->napi_id == napi_id) {
>> +			ne->timeout = jiffies + NAPI_TIMEOUT;
>> +			goto out;
>> +		}
>> +	}
>> +
>> +	ne = kmalloc(sizeof(*ne), GFP_NOWAIT);
>> +	if (!ne)
>> +		goto out;
>> +
>> +	ne->napi_id = napi_id;
>> +	ne->timeout = jiffies + NAPI_TIMEOUT;
>> +	list_add_tail(&ne->list, &ctx->napi_list);
>> +
>> +out:
>> +	spin_unlock(&ctx->napi_lock);
>> +}
> 
> I think this all looks good now, just one minor comment on the above. Is
> the expectation here that we'll basically always add to the napi list?
> If so, then I think allocating 'ne' outside the spinlock would be a lot
> saner, and then just kfree() it for the unlikely case where we find a
> duplicate.

After thinking about this a bit more, I don't think this is done in the
most optimal fashion. If the list is longer than a few entries, this
check (or check-alloc-insert) is pretty expensive and it'll add
substantial overhead to the poll path for sockets if napi is enabled.

I think we should do something ala:

1) When arming poll AND napi has been enabled for the ring, then
    alloc io_napi_entry upfront and store it in ->async_data.

2) Maintain the state in the io_napi_entry. If we're on the list,
    that can be checked with just list_empty(), for example. If not
    on the list, assign timeout and add.

3) Have regular request cleanup free it.

This could be combined with an alloc cache, I would not do that for the
first iteration though.

This would make io_napi_add() cheap - no more list iteration, and no
more allocations. And that is arguably the most important part, as that
is called everytime the poll is woken up. Particularly for multishot
that makes a big difference.

It's also designed much better imho, moving the more expensive bits to
the setup side.

-- 
Jens Axboe



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux