Re: [PATCH 0/4] implement vectored registered buffers for sendzc

Pavel Begunkov <asml.silence@xxxxxxxxx> · Thu, 24 Oct 2024 23:51:33 +0100

On 10/24/24 23:22, Jens Axboe wrote:
On 10/24/24 4:14 PM, Pavel Begunkov wrote:
On 10/24/24 20:56, Jens Axboe wrote:
On 10/24/24 12:13 PM, Pavel Begunkov wrote:
On 10/24/24 19:00, Jens Axboe wrote:
On 10/24/24 11:56 AM, Pavel Begunkov wrote:
On 10/24/24 18:19, Jens Axboe wrote:
On 10/24/24 10:06 AM, Pavel Begunkov wrote:
On 10/24/24 16:45, Jens Axboe wrote:
...>>>> Seems like you're agreeing but then stating the opposite, there
is some confusion. I'm saying that IMHO the right API wise way
is resolving an imu at issue time, just like it's done for fixed
files, and what your recent series did for send zc.

Yeah early morning confusion I guess. And I do agree in principle,
though for registered buffers, those have to be registered upfront
anyway, so no confusion possible with prep vs issue there. For provided
buffers, it only matters for the legacy ones, which generally should not
be used. Doesn't change the fact that you're technically correct, the
right time to resolve them would be at issue time.

I'm talking about sendmsg with iovec. Registered buffers should
be registered upfront, that's right, but iovec should be copied
at prep, and finally resolved into bvecs incl the imu/buffer lookup
at the issue time. And those are two different points in time,
maybe because of links, draining or anything else. And if they
should be at different moments, there is no way to do it while
copying iovec.

Oh I totally follow, the incremental approach would only work if it can
be done at prep time. If at issue time, then it has to turn an existing
iovec array into the appropriate bvec array. And that's where you'd have
to do some clever bits to avoid holding both a full bvec and iovec array
in memory, which would be pretty wasteful/inefficient. If done at issue

Why would it be wasteful and inefficient? No more than jumping
though that incremental infra for each chunk, doubling the size
of the array / reallocating / memcpy'ing it, instead of a tight
loop doing the entire conversion.

Because it would prevent doing an iovec at-the-time import, then turning
it into the desired bvec. That's one loop instead of two. You would have
the space upfront, there should be no need to realloc+memcpy. And then
there's the space concern, where the initial import is an iovec, and
then you need a bvec. For 64-bit that's fine as they take up the same
amount of space,

That's not true, each iov can produce multiple bvec entries so
iovs might get overwritten if you do it the simplest way.

What part isn't true? Yeah one iovec can turn into multiple bvec
segments, the provided send zc stuff I sent does deal with that. So yeah
it's not necessarily a 1:1 mapping, and even if they have the same size,
you may need more elements on the bvec size.

Ok, you didn't state why 64 bit is fine, what I'm saying is that
irrelevant of the element size, if you have an iovec array with
free space at the end just enough so that after overwriting iovecs
it can fit in the resulting bvec, a simple in place algorithm from
left to right will still fail.

Doesn't change the fact that you can loop once and do it. If you need to
expand the bvec size, that would be a realloc+copy. But that part is
true even if you first import all iovecs, and then iterate them to map
the bvecs. Unless you do some upfront tracking to know how many elements
you need, but that would seem overly convoluted. With caching, the
expansion should be a rare occurence outside of the initial import into
a new region.

but for 32-bit it'd make incremental importing from a
stable iovec to a bvec array a bit more tricky (and would need realloc,
unless you over-alloc'ed for the iovec array upfront).

And that's not true, you can still well do it in place if
iovec is placed right in the memory, which I explicitly
noted there are simple enough ways to do it in place
without extra reallocs.

I don't think anything stated there is untrue, just saying it's a bit

"and would need realloc, unless you over-alloc'ed for the iovec array
upfront", that one is if in the second part you're talking about
array_size = bvec_size + iovec_size (in bytes). All you need is max
of two for making it in place.

more tricky. Which is certainly true, if it's the same memory region and
there's overlaps. But let's just see the code for it, much easier to
discuss over those parts rather than pontificate hypotheticals :-)

--
Pavel Begunkov