On Tue, 2022-06-14 at 14:26 +0800, Hao Xu wrote: > On 6/12/22 15:30, Hao Xu wrote: > > On 6/10/22 13:55, Hao Xu wrote: > > > Hi all, > > > > > > I've actually done most code of this, but I think it's necessary > > > to > > > first ask community for comments on the design. what I do is when > > > consuming a buffer, don't increment the head, but check the > > > length > > > in real use. Then update the buffer info like > > > buff->addr += len, buff->len -= len; > > > (off course if a req consumes the whole buffer, just increment > > > head) > > > and since we now changed the addr of buffer, a simple buffer id > > > is > > > useless for userspace to get the data. We have to deliver the > > > original > > > addr back to userspace through cqe->extra1, which means this > > > feature > > > needs CQE32 to be on. > > > This way a provided buffer may be splited to many pieces, and > > > userspace > > > should track each piece, when all the pieces are spare again, > > > they can > > > re-provide the buffer.(they can surely re-provide each piece > > > separately > > > but that causes more and more memory fragments, anyway, it's > > > users' > > > choice.) > > > > > > How do you think of this? Actually I'm not a fun of big cqe, it's > > > not > > > perfect to have the limitation of having CQE32 on, but seems no > > > other > > > option? > > Another way is two rings, just like sqring and cqring. Users provide > buffers to sqring, kernel fetches it and when data is there put it to > cqring for users to read. The downside is we need to copy the buffer > metadata. and there is a limitation of how many times we can split > the > buffer since the cqring has a length. > > > > > > > Thanks, > > > Hao > > > > To implement this, CQE32 have to be introduced to almost > > everywhere. > > For example for io_issue_sqe: > > > > def->issue(); > > if (unlikely(CQE32)) > > __io_req_complete32(); > > else > > __io_req_complete(); > > > > which will cerntainly have some overhead for main path. Any > > comments? > > > > Regards, > > Hao > > > I find the idea interesting, but is it definitely worth doing? Other downsides I see with this approach: * userspace would have to keep track of when a buffer is finished. This might get complicated. * there is a problem of tiny writes - would we want to support a minimum buffer size? I think in general it can be acheived using the existing buffer ring and leave the management to userspace. For example if a user prepares a ring with N large buffers, on each completion the user is free to requeue that buffer without the recently completed chunk. The downsides here I see are: * there is a delay to requeuing the buffer. This might cause more ENOBUFS. Practically I 'feel' this will not be a big problem in practice * there is an additional atomic incrememnt on the ring Do you feel the wins are worth the extra complexity?