Re: [PATCHSET RFC 0/7] Add support for provided registered buffers

Jens Axboe <axboe@xxxxxxxxx> · Thu, 24 Oct 2024 09:11:56 -0600

On 10/24/24 9:04 AM, Pavel Begunkov wrote:
> On 10/24/24 15:43, Jens Axboe wrote:
>> On 10/24/24 8:36 AM, Pavel Begunkov wrote:
>>> On 10/23/24 17:07, Jens Axboe wrote:
>>>> Hi,
>>>>
>>>> Normally a request can take a provided buffer, which means "pick a
>>>> buffer from group X and do IO to/from it", or it can use a registered
>>>> buffer, which means "use the buffer at index Y and do IO to/from it".
>>>> For things like O_DIRECT and network zero copy, registered buffers can
>>>> be used to speedup the operation, as they avoid repeated
>>>> get_user_pages() and page referencing calls for each IO operation.
>>>>
>>>> Normal (non zero copy) send supports bundles, which is a way to pick
>>>> multiple provided buffers at once and send them. send zero copy only
>>>> supports registered buffers, and hence can only send a single buffer
>>>
>>> That's not true, has never been, send[msg] zc work just fine with
>>> normal (non-registered) buffers.
>>
>> That's not what I'm saying, perhaps it isn't clear. What I'm trying to
>> say is that it only supports registered buffers, it does not support
>> provided buffers. It obviously does support regular user provided
>> buffers that aren't registered or provided, I figured that goes without
>> saying explicitly.
> 
> Normally goes without saying yes, but the confusion here is because
> of a more or less explicit implication (or at least I read it so)
> "it only supports registered buffers => selected buffer support
> should support registered buffers, which it adds"

I'll expand it to be more clear.

> Does the series allows provided buffers with normal user memory?

Yep, it should allow either picking one (or more, for bundles) provided
buffers, and the provided buffer is either normal user memory, or it's
indices into registered buffers.

>>>> This patchset adds support for using a mix of provided and registered
>>>> buffers, where the provided buffers merely provide an index into which
>>>> registered buffers to use. This enables using provided buffers for
>>>> send zc in general, but also bundles where multiple buffers are picked.
>>>> This is done by changing how the provided buffers are intepreted.
>>>> Normally a provided buffer has an address, length, and buffer ID
>>>> associated with it. The address tells the kernel where the IO should
>>>> occur. If both fixed and provided buffers are asked for, the provided
>>>> buffer address field is instead an encoding of the registered buffer
>>>> index and the offset within that buffer. With that in place, using a
>>>> combination of the two can work.
>>>
>>> What the series doesn't say is how it works with notifications and
>>> what is the proposed user API in regard to it, it's the main if not
>>> the only fundamental distinctive part of the SENDZC API.
>>
>> Should not change that? You'll should get the usual two notifications on
>> send complete, and reuse safe.
> 
> Right you get a notification, but what is it supposed to mean to
> the user? Like "the notification indicates that all buffers that
> are consumed by this request can be reused". Multishot is not a
> thing, but how the user has to track what buffers are consumed
> by this request? I assume it posts a CQE per buffer completion,
> right?

Depends on if it's bundles or not. For a non-bundle, a single buffer is
picked, and that buffer is either user memory or it's an index into a
registered buffer. For that, completions work just like they do now - a
single one is posted for the expected inline completion with buffer ID,
and one is posted for reuse laster.

If it's a bundle, it works the same way, two completions are posted. The
first expected inline one will have a buffer ID, and the length will
tell you how many consecutive buffers were sent/consumed. Then the reuse
notification goes with that previous completion.

> And let's say you have send heavy workload where the user pushes
> more than the socket can take, i.e. it has to wait to send more
> and there is always something to send. Does it poll-retry as it's
> usually done for multishots? How notifications are paced? i.e.
> it'll continue hooking more and more buffers onto the same
> notification locking all the previously used buffers.

If it sends nothing, nothing is consumed. If it's a partial send, then
buffers are kept (as nobody else should send them anyway), and it's
retried based on the poll trigger. For the latter case, completion is
postd at the end, when the picked buffers are done. For pacing, you can
limit the amount of data sent by just setting ->len to your desired
bundle/batch size.

-- 
Jens Axboe