On 11/1/24 19:24, Mina Almasry wrote:
On Fri, Nov 1, 2024 at 11:34 AM Pavel Begunkov <asml.silence@xxxxxxxxx> wrote:
...
Huh, interesting. For devmem TCP we bind a region of memory to the
queue once, and after that we can create N connections all reusing the
same memory region. Is that not the case for io_uring? There are no
Hmm, I think we already discussed the same question before. Yes, it
does indeed support arbitrary number of connections. For what I was
saying above, the devmem TCP analogy would be attaching buffers to the
netlink socket instead of a tcp socket (that new xarray you added) when
you give it to user space. Then, you can close the connection after a
receive and the buffer you've got would still be alive.
Ah, I see. You're making a tradeoff here. You leave the buffers alive
after each connection so the userspace can still use them if it wishes
but they are of course unavailable for other connections.
But in our case (and I'm guessing yours) the process that will set up
the io_uring memory provider/RSS/flow steering will be a different
process from the one that sends/receive data, no? Because the former
requires CAP_NET_ADMIN privileges while the latter will not. If they
are 2 different processes, what happens when the latter process doing
the send/receive crashes? Does the memory stay unavailable until the
CAP_NET_ADMIN process exits? Wouldn't it be better to tie the lifetime
of the buffers of the connection? Sure, the buffers will become
That's the tradeoff google is willing to do in the framework,
which is fine, but it's not without cost, e.g. you need to
store/erase into the xarray, and it's a design choice in other
aspects, like you can't release the page pool if the socket you
got a buffer from is still alive but the net_iov hasn't been
returned.
unavailable after the connection is closed, but at least you don't
'leak' memory on send/receive process crashes.
Unless of course you're saying that only CAP_NET_ADMIN processes will
The user can pass io_uring instance itself
run io_rcrx connections. Then they can do their own mp setup/RSS/flow
steering and there is no concern when the process crashes because
everything will be cleaned up. But that's a big limitation to put on
the usage of the feature no?
--
Pavel Begunkov