On 3/19/25 19:15, Stefan Metzmacher wrote:
Am 19.03.25 um 19:37 schrieb Jens Axboe:
On 3/19/25 11:45 AM, Joe Damato wrote:
On Wed, Mar 19, 2025 at 11:20:50AM -0600, Jens Axboe wrote:
...
My argument would be the same as for other features - if you can do it
simpler this other way, why not consider that? The end result would be
the same, you can do fast sendfile() with sane buffer reuse. But the
kernel side would be simpler, which is always a kernel main goal for
those of us that have to maintain it.
Just adding sendfile2() works in the sense that it's an easier drop in
replacement for an app, though the error queue side does mean it needs
to change anyway - it's not just replacing one syscall with another. And
if we want to be lazy, sure that's fine. I just don't think it's the
best way to do it when we literally have a mechanism that's designed for
this and works with reuse already with normal send zc (and receive side
too, in the next kernel).
A few month (or even years) back, Pavel came up with an idea
to implement some kind of splice into a fixed buffer, if that
would be implemented I guess it would help me in Samba too.
My first usage was on the receive side (from the network).
I did it as a testing ground for infra needed for ublk zerocopy,
but if that's of interest I can resurrect the patches and see
where it goes, especially since the aforementioned infra just got
queued.
But the other side might also be possible now we have RWF_DONTCACHE.
Instead of dropping the pages from the page cache, it might
be possible move them to fixed buffer instead.
It would mean the pages would be 'stable' when they are
no longer part of the pagecache.
But maybe my assumption for that is too naive...
That's an interesting idea
Anyway that splice into a fixed buffer would great to have,
as the new IORING_OP_RECV_ZC, requires control over the
hardware queues of the nic and only allows a single process
Right, it basically borrows a hardware rx queue and that
needs CAP_NET_ADMIN, and the user also has to set up steering
rules.
to provide buffers for that receive queue (at least that's how
I understand it). And that's not possible for multiple process
(maybe not belonging to the same high level application and likely
It's up to the user to decide who returns buffers back (and how to
sychronise that) as the api is just a user mapped ring. Regardless,
it's not a finished project, David and I looked at features we want
to add to make life easier for multithreaded apps that can't throw
that many queues. I see your point though.
non-root applications). So it would be great have splice into
fixed buffer as alternative to IORING_OP_SPLICE/IORING_OP_TEE,
as it would be more flexible to use in combination with
IORING_OP_SENDMSG_ZC as well as IORING_OP_WRITE[V]_FIXED with RWF_DONTCACHE.
I guess such a splice into fixed buffer linked to IORING_OP_SENDMSG_ZC
would be the way to simulate the sendfile2() in userspace?
Right, and that approach allows to handle intermediate errors,
which is why it doesn't need to put restrictions on the input
file.
--
Pavel Begunkov