Re: [RFC] support memory recycle for ring-mapped provided buffer

Hao Xu <hao.xu@xxxxxxxxx> · Tue, 14 Jun 2022 17:52:41 +0800

Hi Dylan,

On 6/14/22 16:38, Dylan Yudaken wrote:
On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote:
On 6/12/22 15:30, Hao Xu wrote:
On 6/10/22 13:55, Hao Xu wrote:
Hi all,

I've actually done most code of this, but I think it's necessary
to
first ask community for comments on the design. what I do is when
consuming a buffer, don't increment the head, but check the
length
in real use. Then update the buffer info like
buff->addr += len, buff->len -= len;
(off course if a req consumes the whole buffer, just increment
head)
and since we now changed the addr of buffer, a simple buffer id
is
useless for userspace to get the data. We have to deliver the
original
addr back to userspace through cqe->extra1, which means this
feature
needs CQE32 to be on.
This way a provided buffer may be splited to many pieces, and
userspace
should track each piece, when all the pieces are spare again,
they can
re-provide the buffer.(they can surely re-provide each piece
separately
but that causes more and more memory fragments, anyway, it's
users'
choice.)

How do you think of this? Actually I'm not a fun of big cqe, it's
not
perfect to have the limitation of having CQE32 on, but seems no
other
option?

Another way is two rings, just like sqring and cqring. Users provide
buffers to sqring, kernel fetches it and when data is there put it to
cqring for users to read. The downside is we need to copy the buffer
metadata. and there is a limitation of how many times we can split
the
buffer since the cqring has a length.

Thanks,
Hao

To implement this, CQE32 have to be introduced to almost
everywhere.
For example for io_issue_sqe:

def->issue();
if (unlikely(CQE32))
      __io_req_complete32();
else
      __io_req_complete();

which will cerntainly have some overhead for main path. Any
comments?

For this downside, I think there is way to limit it to only read/recv
path.

Regards,
Hao

I find the idea interesting, but is it definitely worth doing?

Other downsides I see with this approach:
* userspace would have to keep track of when a buffer is finished. This
might get complicated.
This one is fine I think, since users can choose not to enable this
feature and if they use it, they can choose not to track the buffer
but to re-provide each piece immediately.
(When a user register the pbuf ring, they can deliver a flag to enable
this feature.)

* there is a problem of tiny writes - would we want to support a
minimum buffer size?

Sorry I'm not following here, why do we need to have a min buffer size?

I think in general it can be acheived using the existing buffer ring
and leave the management to userspace. For example if a user prepares a
ring with N large buffers, on each completion the user is free to
requeue that buffer without the recently completed chunk.

[1]
I see, was not aware of this...

The downsides here I see are:
  * there is a delay to requeuing the buffer. This might cause more
ENOBUFS. Practically I 'feel' this will not be a big problem in
practice
  * there is an additional atomic incrememnt on the ring

Do you feel the wins are worth the extra complexity?

Personally speaking, the only downside of my first approach is overhead
of cqe32 on iopoll completion path and read/recv/recvmsg path. But looks
[1] is fine...TBH I'm not sure which one is better.

Thanks,
Hao