Hey, Dave! On Wed, Jun 05, 2024 at 12:31:56AM +0000, Dr. David Alan Gilbert wrote: > * Michael Galaxy (mgalaxy@xxxxxxxxxx) wrote: > > One thing to keep in mind here (despite me not having any hardware to test) > > was that one of the original goals here > > in the RDMA implementation was not simply raw throughput nor raw latency, > > but a lack of CPU utilization in kernel > > space due to the offload. While it is entirely possible that newer hardware > > w/ TCP might compete, the significant > > reductions in CPU usage in the TCP/IP stack were a big win at the time. > > > > Just something to consider while you're doing the testing........ > > I just noticed this thread; some random notes from a somewhat > fragmented memory of this: > > a) Long long ago, I also tried rsocket; > https://lists.gnu.org/archive/html/qemu-devel/2015-01/msg02040.html > as I remember the library was quite flaky at the time. Hmm interesting. There also looks like a thread doing rpoll(). Btw, not sure whether you noticed, but there's the series posted for the latest rsocket conversion here: https://lore.kernel.org/r/1717503252-51884-1-git-send-email-arei.gonglei@xxxxxxxxxx I hope Lei and his team has tested >4G mem, otherwise definitely worth checking. Lei also mentioned there're rsocket bugs they found in the cover letter, but not sure what's that about. > > b) A lot of the complexity in the rdma migration code comes from > emulating a stream to carry the migration control data and interleaving > that with the actual RAM copy. I believe the original design used > a separate TCP socket for the control data, and just used the RDMA > for the data - that should be a lot simpler (but alas was rejected > in review early on) > > c) I can't rememmber the last benchmarks I did; but I think I did > manage to beat RDMA with multifd; but yes, multifd does eat host CPU > where as RDMA barely uses a whisper. I think my first impression on this matter came from you on this one. :) > > d) The 'zero-copy-send' option in migrate may well get some of that > CPU time back; but if I remember we were still bottle necked on > the receive side. (I can't remember if zero-copy-send worked with > multifd?) Yes, and zero-copy requires multifd for now. I think it's because we didn't want to complicate the header processings in the migration stream where it may not be page aligned. > > e) Someone made a good suggestion (sorry can't remember who) - that the > RDMA migration structure was the wrong way around - it should be the > destination which initiates an RDMA read, rather than the source > doing a write; then things might become a LOT simpler; you just need > to send page ranges to the destination and it can pull it. > That might work nicely for postcopy. I'm not sure whether it'll still be a problem if rdma recv side is based on zero-copy. It would be a matter of whether atomicity can be guaranteed so that we don't want the guest vcpus to see a partially copied page during on-flight DMAs. UFFDIO_COPY (or friend) is currently the only solution for that. Thanks, -- Peter Xu