On 9/30/24 14:47, Peter Xu wrote:
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
On Mon, Sep 30, 2024 at 07:20:56PM +0000, Sean Hefty wrote:
I'm sure rsocket has its place with much smaller transfer sizes, but
this is very different.
Is it possible to make rsocket be friendly with large buffers (>4GB) like the VM
use case?
If you can perform large VM migrations using streaming sockets, rsockets is likely usable, but it will involve data copies. The problem is the socket API semantics.
There are rsocket API extensions (riowrite, riomap) to support RDMA write operations. This avoids the data copy at the target, but not the sender. (riowrite follows the socket send semantics on buffer ownership.)
It may be possible to enhance rsockets with MSG_ZEROCOPY or io_uring extensions to enable zero-copy for large transfers, but that's not something I've looked at. True zero copy may require combining MSG_ZEROCOPY with riowrite, but then that moves further away from using traditional socket calls.
Thanks, Sean.
One thing to mention is that QEMU has QIO_CHANNEL_WRITE_FLAG_ZERO_COPY,
which already supports MSG_ZEROCOPY but only on sender side, and only if
when multifd is enabled, because it requires page pinning and alignments,
while it's more challenging to pin a random buffer than a guest page.
Nobody moved on yet with zerocopy recv for TCP; there might be similar
challenges that normal socket APIs may not work easily on top of current
iochannel design, but I don't know well to say..
Not sure whether it means there can be a shared goal with QEMU ultimately
supporting better zerocopy via either TCP or RDMA. If that's true, maybe
there's chance we can move towards rsocket with all the above facilities,
meanwhile RDMA can, ideally, run similiarly like TCP with the same (to be
enhanced..) iochannel API, so that it can do zerocopy on both sides with
either transport.
What about the testing solution that I mentioned?
Does that satisfy your concerns? Or is there still a gap here that needs
to be met?
- Michael