RE: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Peter,

> -----Original Message-----
> From: Peter Xu [mailto:peterx@xxxxxxxxxx]
> Sent: Wednesday, May 22, 2024 6:15 AM
> To: Yu Zhang <yu.zhang@xxxxxxxxx>
> Cc: Michael Galaxy <mgalaxy@xxxxxxxxxx>; Jinpu Wang
> <jinpu.wang@xxxxxxxxx>; Elmar Gerdes <elmar.gerdes@xxxxxxxxx>;
> zhengchuan <zhengchuan@xxxxxxxxxx>; Gonglei (Arei)
> <arei.gonglei@xxxxxxxxxx>; Daniel P. Berrangé <berrange@xxxxxxxxxx>;
> Markus Armbruster <armbru@xxxxxxxxxx>; Zhijian Li (Fujitsu)
> <lizhijian@xxxxxxxxxxx>; qemu-devel@xxxxxxxxxx; Yuval Shaia
> <yuval.shaia.ml@xxxxxxxxx>; Kevin Wolf <kwolf@xxxxxxxxxx>; Prasanna
> Kumar Kalever <prasanna.kalever@xxxxxxxxxx>; Cornelia Huck
> <cohuck@xxxxxxxxxx>; Michael Roth <michael.roth@xxxxxxx>; Prasanna
> Kumar Kalever <prasanna4324@xxxxxxxxx>; Paolo Bonzini
> <pbonzini@xxxxxxxxxx>; qemu-block@xxxxxxxxxx; devel@xxxxxxxxxxxxxxxxx;
> Hanna Reitz <hreitz@xxxxxxxxxx>; Michael S. Tsirkin <mst@xxxxxxxxxx>;
> Thomas Huth <thuth@xxxxxxxxxx>; Eric Blake <eblake@xxxxxxxxxx>; Song
> Gao <gaosong@xxxxxxxxxxx>; Marc-André Lureau
> <marcandre.lureau@xxxxxxxxxx>; Alex Bennée <alex.bennee@xxxxxxxxxx>;
> Wainer dos Santos Moschetta <wainersm@xxxxxxxxxx>; Beraldo Leal
> <bleal@xxxxxxxxxx>; Pannengyuan <pannengyuan@xxxxxxxxxx>;
> Xiexiangyou <xiexiangyou@xxxxxxxxxx>; Fabiano Rosas <farosas@xxxxxxx>
> Subject: Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling
> 
> On Fri, May 17, 2024 at 03:01:59PM +0200, Yu Zhang wrote:
> > Hello Michael and Peter,
> 
> Hi,
> 
> >
> > Exactly, not so compelling, as I did it first only on servers widely
> > used for production in our data center. The network adapters are
> >
> > Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720
> > 2-port Gigabit Ethernet PCIe
> 
> Hmm... I definitely thinks Jinpu's Mellanox ConnectX-6 looks more reasonable.
> 
> https://lore.kernel.org/qemu-devel/CAMGffEn-DKpMZ4tA71MJYdyemg0Zda15
> wVAqk81vXtKzx-LfJQ@xxxxxxxxxxxxxx/
> 
> Appreciate a lot for everyone helping on the testings.
> 
> > InfiniBand controller: Mellanox Technologies MT27800 Family
> > [ConnectX-5]
> >
> > which doesn't meet our purpose. I can choose RDMA or TCP for VM
> > migration. RDMA traffic is through InfiniBand and TCP through Ethernet
> > on these two hosts. One is standby while the other is active.
> >
> > Now I'll try on a server with more recent Ethernet and InfiniBand
> > network adapters. One of them has:
> > BCM57414 NetXtreme-E 10Gb/25Gb RDMA Ethernet Controller (rev 01)
> >
> > The comparison between RDMA and TCP on the same NIC could make more
> sense.
> 
> It looks to me NICs are powerful now, but again as I mentioned I don't think it's
> a reason we need to deprecate rdma, especially if QEMU's rdma migration has
> the chance to be refactored using rsocket.
> 
> Is there anyone who started looking into that direction?  Would it make sense
> we start some PoC now?
> 

My team has finished the PoC refactoring which works well. 

Progress:
1.  Implement io/channel-rdma.c,
2.  Add unit test tests/unit/test-io-channel-rdma.c and verifying it is successful,
3.  Remove the original code from migration/rdma.c,
4.  Rewrite the rdma_start_outgoing_migration and rdma_start_incoming_migration logic,
5.  Remove all rdma_xxx functions from migration/ram.c. (to prevent RDMA live migration from polluting the core logic of live migration),
6.  The soft-RoCE implemented by software is used to test the RDMA live migration. It's successful.

We will be submit the patchset later.


Regards,
-Gonglei

> Thanks,
> 
> --
> Peter Xu





[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux