Re: [PATCH 6/6] io_uring: cache nodes and mapped buffers

Keith Busch <kbusch@xxxxxxxxxx> · Fri, 7 Feb 2025 08:33:40 -0700

On Fri, Feb 07, 2025 at 12:41:17PM +0000, Pavel Begunkov wrote:
> On 2/3/25 15:45, Keith Busch wrote:
> > From: Keith Busch <kbusch@xxxxxxxxxx>
> > 
> > Frequent alloc/free cycles on these is pretty costly. Use an io cache to
> > more efficiently reuse these buffers.
> > 
> > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx>
> > ---
> >   include/linux/io_uring_types.h |  16 ++---
> >   io_uring/filetable.c           |   2 +-
> >   io_uring/rsrc.c                | 108 ++++++++++++++++++++++++---------
> >   io_uring/rsrc.h                |   2 +-
> >   4 files changed, 92 insertions(+), 36 deletions(-)
> > 
> > diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h
> > index aa661ebfd6568..c0e0c1f92e5b1 100644
> > --- a/include/linux/io_uring_types.h
> > +++ b/include/linux/io_uring_types.h
> > @@ -67,8 +67,17 @@ struct io_file_table {
> >   	unsigned int alloc_hint;
> >   };
> > +struct io_alloc_cache {
> > +	void			**entries;
> > +	unsigned int		nr_cached;
> > +	unsigned int		max_cached;
> > +	size_t			elem_size;
> > +};
> > +
> >   struct io_buf_table {
> >   	struct io_rsrc_data	data;
> > +	struct io_alloc_cache	node_cache;
> > +	struct io_alloc_cache	imu_cache;
> 
> We can avoid all churn if you kill patch 5/6 and place put the
> caches directly into struct io_ring_ctx. It's a bit better for
> future cache improvements and we can even reuse the node cache
> for files.
> 
> ...
> > diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
> > index 864c2eabf8efd..5434b0d992d62 100644
> > --- a/io_uring/rsrc.c
> > +++ b/io_uring/rsrc.c
> > @@ -117,23 +117,39 @@ static void io_buffer_unmap(struct io_ring_ctx *ctx, struct io_rsrc_node *node)
> >   				unpin_user_page(imu->bvec[i].bv_page);
> >   		if (imu->acct_pages)
> >   			io_unaccount_mem(ctx, imu->acct_pages);
> > -		kvfree(imu);
> > +		if (struct_size(imu, bvec, imu->nr_bvecs) >
> > +				ctx->buf_table.imu_cache.elem_size ||
> 
> It could be quite a large allocation, let's not cache it if
> it hasn't came from the cache for now. We can always improve
> on top.

Eh? This already skips inserting into the cache if it wasn't allocated
out of the cache.

I picked an arbitrary size, 512b, as the threshold for caching. If you
need more bvecs than fit in that, it falls back to a kvmalloc/kvfree.
The allocation overhead is pretty insignificant when you're transferring
large payloads like that, and 14 vectors was chosen as the tipping point
because it fits in a nice round number.

> And can we invert how it's calculated? See below. You'll have
> fewer calculations in the fast path, and I don't really like
> users looking at ->elem_size when it's not necessary.
> 
> 
> #define IO_CACHED_BVEC_SEGS	N

Yah, that's fine.