On 10/24/24 10:06 AM, Pavel Begunkov wrote: > On 10/24/24 16:45, Jens Axboe wrote: >> On 10/24/24 9:29 AM, Pavel Begunkov wrote: >>> On 10/23/24 14:52, Jens Axboe wrote: >>>> On 10/22/24 8:38 PM, Pavel Begunkov wrote: >>>>> Allow registered buffers to be used with zerocopy sendmsg, where the >>>>> passed iovec becomes a scatter list into the registered buffer >>>>> specified by sqe->buf_index. See patches 3 and 4 for more details. >>>>> >>>>> To get performance out of it, it'll need a bit more work on top for >>>>> optimising allocations and cleaning up send setups. We can also >>>>> implement it for non zerocopy variants and reads/writes in the future. >>>>> >>>>> Tested by enabling it in test/send-zerocopy.c, which checks payloads, >>>>> and exercises lots of corner cases, especially around send sizes, >>>>> offsets and non aligned registered buffers. >>>> >>>> Just for the edification of the list readers, Pavel and I discussed this >>>> a bit last night. There's a decent amount of overlap with the send zc >>>> provided + registered buffers work that I did last week, but haven't >>>> posted yet. It's here; >>>> >>>> https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided >>>> >>>> in terms of needing and using both bvec and iovec in the array, and >>>> having the suitable caching for the arrays rather than needing a full >>>> alloc + free every time. >>> >>> And as I mentioned, that can be well done in-place (in the same >>> array) as one option. >> >> And that would be the way to do it, like I mentioned as well, that is >> how my first iteration of the above did it too. But since this one just >> needs to end up with an array of bvec, it was pointless for my series to >> do the iovec import and only then turn it into bvecs. >> >> So somewhat orthogonal, as the use cases aren't exactly the same. One >> deals with iovecs out of necessity, the other one only previously did as >> a matter of convenience to reuse existing iovec based helpers. >> >>>> The send zc part can map into bvecs upfront and hence don't need the >>>> iovec array storage at the same time, which this one does as the sendmsg >>>> zc opcode needs to import an iovec. But perhaps there's a way to still >>>> unify the storage and retain the caching, without needing to come up >>>> with something new. >>> >>> I looked through. The first problem is that your thing consuming >>> entries from the ring, with iovecs it'd need to be reading it >>> from the user one by one. Considering allocations in your helpers, >>> that would turn it into a bunch of copy_from_user, and it might >>> just easier and cleaner to copy the entire iovec. >> >> I think for you case, incremental import would be the way to go. Eg >> something ala: >> >> if (!user_access_begin(user_iovec, nr_vecs * sizeof(*user_iovec))) >> return -EFAULT; > > Is it even legal? I don't know how it's implemented specifically > but I assume there can be any kind of magic, e.g. intercepting > page faults. I'd definitely prefer to avoid anything but the simplest > handling like read from/write to memory, e.g. no allocations. > >> >> bv = kmsg->bvec; >> for_each_iov { >> struct iovec iov; >> >> unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo); >> unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo); >> >> import_to_bvec(bv, &iov); >> >> user_iovec++; >> bv++; >> } >> >> if it can be done at prep time, because then there's no need to store >> the iovec at all, it's already stable, just in bvecs. And this avoids >> overlapping iovec/bvec memory, and it avoids doing two iterations for >> import. Purely a poc, just tossing out ideas. >> >> But I haven't looked too closely at your series yet. In any case, >> whatever ends up working for you, will most likely be find for the >> bundled zerocopy send (non-vectored) as well, and I can just put it on >> top of that. >> >>> And you just made one towards delaying the imu resolution, which >>> is conceptually the right thing to do because of the mess with >>> links, just like it is with fixed files. That's why it need to >>> copy the iovec at the prep stage and resolve at the issue time. >> >> Indeed, prep time is certainly the place to do it. And the above >> incremental import should work fine then, as we won't care abot >> user_iovec being stable being prep. > > Seems like you're agreeing but then stating the opposite, there > is some confusion. I'm saying that IMHO the right API wise way > is resolving an imu at issue time, just like it's done for fixed > files, and what your recent series did for send zc. Yeah early morning confusion I guess. And I do agree in principle, though for registered buffers, those have to be registered upfront anyway, so no confusion possible with prep vs issue there. For provided buffers, it only matters for the legacy ones, which generally should not be used. Doesn't change the fact that you're technically correct, the right time to resolve them would be at issue time. -- Jens Axboe