Re: [RFC] support memory recycle for ring-mapped provided buffer

Dylan Yudaken <dylany@xxxxxx> · Tue, 14 Jun 2022 08:38:32 +0000

On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
> On 6/12/22 15:30, Hao Xu wrote:
> > On 6/10/22 13:55, Hao Xu wrote:
> > > Hi all,
> > > 
> > > I've actually done most code of this, but I think it's necessary
> > > to
> > > first ask community for comments on the design. what I do is when
> > > consuming a buffer, don't increment the head, but check the
> > > length
> > > in real use. Then update the buffer info like
> > > buff->addr += len, buff->len -= len;
> > > (off course if a req consumes the whole buffer, just increment
> > > head)
> > > and since we now changed the addr of buffer, a simple buffer id
> > > is
> > > useless for userspace to get the data. We have to deliver the
> > > original
> > > addr back to userspace through cqe->extra1, which means this
> > > feature
> > > needs CQE32 to be on.
> > > This way a provided buffer may be splited to many pieces, and
> > > userspace
> > > should track each piece, when all the pieces are spare again,
> > > they can
> > > re-provide the buffer.(they can surely re-provide each piece
> > > separately
> > > but that causes more and more memory fragments, anyway, it's
> > > users'
> > > choice.)
> > > 
> > > How do you think of this? Actually I'm not a fun of big cqe, it's
> > > not
> > > perfect to have the limitation of having CQE32 on, but seems no
> > > other
> > > option?
> 
> Another way is two rings, just like sqring and cqring. Users provide
> buffers to sqring, kernel fetches it and when data is there put it to
> cqring for users to read. The downside is we need to copy the buffer
> metadata. and there is a limitation of how many times we can split
> the
> buffer since the cqring has a length.
> 
> > > 
> > > Thanks,
> > > Hao
> > 
> > To implement this, CQE32 have to be introduced to almost
> > everywhere.
> > For example for io_issue_sqe:
> > 
> > def->issue();
> > if (unlikely(CQE32))
> >      __io_req_complete32();
> > else
> >      __io_req_complete();
> > 
> > which will cerntainly have some overhead for main path. Any
> > comments?
> > 
> > Regards,
> > Hao
> > 
> 

I find the idea interesting, but is it definitely worth doing? 

Other downsides I see with this approach:
* userspace would have to keep track of when a buffer is finished. This
might get complicated. 
* there is a problem of tiny writes - would we want to support a
minimum buffer size?

I think in general it can be acheived using the existing buffer ring
and leave the management to userspace. For example if a user prepares a
ring with N large buffers, on each completion the user is free to
requeue that buffer without the recently completed chunk. 

The downsides here I see are:
 * there is a delay to requeuing the buffer. This might cause more
ENOBUFS. Practically I 'feel' this will not be a big problem in
practice
 * there is an additional atomic incrememnt on the ring

Do you feel the wins are worth the extra complexity?