On 9/29/24 17:26, Michael S. Tsirkin wrote:
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
On Sun, Sep 29, 2024 at 03:26:58PM -0500, Michael Galaxy wrote:
On 9/29/24 13:14, Michael S. Tsirkin wrote:
!-------------------------------------------------------------------|
This Message Is From an External Sender
This message came from outside your organization.
|-------------------------------------------------------------------!
On Sat, Sep 28, 2024 at 12:52:08PM -0500, Michael Galaxy wrote:
A bounce buffer defeats the entire purpose of using RDMA in these cases.
When using RDMA for very large transfers like this, the goal here is to map
the entire memory region at once and avoid all CPU interactions (except for
message management within libibverbs) so that the NIC is doing all of the
work.
I'm sure rsocket has its place with much smaller transfer sizes, but this is
very different.
To clarify, are you actively using rdma based migration in production? Stepping up
to help maintain it?
Yes, both Huawei and IONOS have both been contributing here in this email
thread.
They are both using it in production.
- Michael
Well, any plans to work on it? for example, postcopy does not really
do zero copy last time I checked, there's also a long TODO list.
I apologize, I'm not following the question here. Isn't that what this
thread is about?
So, some background is missing here, perhaps: A few months ago, there
was a proposal
to remove native RDMA support from live migration due to concerns about
lack of testability.
Both IONOS and Huawei have stepped up that they are using it and are
engaging with the
community here. I also proposed transferring over maintainership to them
as well. (I no longer
have any of this hardware, so I cannot provide testing support anymore).
During that time, rsocket was proposed as an alternative, but as I have
laid out above, I believe
it cannot work for technical reasons.
I also asked earlier in the thread if we can cover the community's
testing concerns using softroce,
so that an integration test can be made to work (presumably through
avocado or something similar).
Does that history make sense?
- Michael