Hi Peter, Hi Chuan, On Thu, May 9, 2024 at 4:14 PM Peter Xu <peterx@xxxxxxxxxx> wrote: > > On Thu, May 09, 2024 at 04:58:34PM +0800, Zheng Chuan via wrote: > > That's a good news to see the socket abstraction for RDMA! > > When I was developed the series above, the most pain is the RDMA migration has no QIOChannel abstraction and i need to take a 'fake channel' > > for it which is awkward in code implementation. > > So, as far as I know, we can do this by > > i. the first thing is that we need to evaluate the rsocket is good enough to satisfy our QIOChannel fundamental abstraction > > ii. if it works right, then we will continue to see if it can give us opportunity to hide the detail of rdma protocol > > into rsocket by remove most of code in rdma.c and also some hack in migration main process. > > iii. implement the advanced features like multi-fd and multi-uri for rdma migration. > > > > Since I am not familiar with rsocket, I need some times to look at it and do some quick verify with rdma migration based on rsocket. > > But, yes, I am willing to involved in this refactor work and to see if we can make this migration feature more better:) > > Based on what we have now, it looks like we'd better halt the deprecation > process a bit, so I think we shouldn't need to rush it at least in 9.1 > then, and we'll need to see how it goes on the refactoring. > > It'll be perfect if rsocket works, otherwise supporting multifd with little > overhead / exported APIs would also be a good thing in general with > whatever approach. And obviously all based on the facts that we can get > resources from companies to support this feature first. > > Note that so far nobody yet compared with rdma v.s. nic perf, so I hope if > any of us can provide some test results please do so. Many people are > saying RDMA is better, but I yet didn't see any numbers comparing it with > modern TCP networks. I don't want to have old impressions floating around > even if things might have changed.. When we have consolidated results, we > should share them out and also reflect that in QEMU's migration docs when a > rdma document page is ready. I also did a tests with Mellanox ConnectX-6 100 G RoCE nic, the results are mixed, for less than 3 streams native ethernet is faster, and when more than 3 streams rsocket performs better. root@x4-right:~# iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 44214 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 52.9 GBytes 45.4 Gbits/sec root@x4-right:~# iperf -c 1.1.1.16 -P 2 [ 3] local 1.1.1.15 port 33118 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 33130 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 4.00 MByte (default) ------------------------------------------------------------ [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0001 sec 45.0 GBytes 38.7 Gbits/sec [ 4] 0.0000-10.0000 sec 43.9 GBytes 37.7 Gbits/sec [SUM] 0.0000-10.0000 sec 88.9 GBytes 76.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.172/0.189/0.205/0.172 ms (tot/err) = 2/0 root@x4-right:~# iperf -c 1.1.1.16 -P 4 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 5] local 1.1.1.15 port 50748 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 50734 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 50764 connected with 1.1.1.16 port 5001 [ 3] local 1.1.1.15 port 50730 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 24.7 GBytes 21.2 Gbits/sec [ 3] 0.0000-10.0004 sec 23.6 GBytes 20.3 Gbits/sec [ 4] 0.0000-10.0000 sec 27.8 GBytes 23.9 Gbits/sec [ 5] 0.0000-10.0000 sec 28.0 GBytes 24.0 Gbits/sec [SUM] 0.0000-10.0000 sec 104 GBytes 89.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.104/0.156/0.204/0.124 ms (tot/err) = 4/0 root@x4-right:~# iperf -c 1.1.1.16 -P 8 [ 4] local 1.1.1.15 port 55588 connected with 1.1.1.16 port 5001 [ 5] local 1.1.1.15 port 55600 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 325 KByte (default) ------------------------------------------------------------ [ 10] local 1.1.1.15 port 55628 connected with 1.1.1.16 port 5001 [ 15] local 1.1.1.15 port 55648 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 55620 connected with 1.1.1.16 port 5001 [ 3] local 1.1.1.15 port 55584 connected with 1.1.1.16 port 5001 [ 14] local 1.1.1.15 port 55644 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 55610 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0015 sec 8.47 GBytes 7.27 Gbits/sec [ 4] 0.0000-10.0011 sec 8.62 GBytes 7.40 Gbits/sec [ 7] 0.0000-10.0000 sec 18.1 GBytes 15.5 Gbits/sec [ 14] 0.0000-10.0000 sec 8.69 GBytes 7.46 Gbits/sec [ 5] 0.0000-10.0006 sec 18.5 GBytes 15.9 Gbits/sec [ 10] 0.0000-10.0006 sec 16.1 GBytes 13.9 Gbits/sec [ 3] 0.0000-10.0000 sec 17.1 GBytes 14.6 Gbits/sec [ 15] 0.0000-10.0016 sec 8.54 GBytes 7.34 Gbits/sec [SUM] 0.0000-10.0017 sec 104 GBytes 89.4 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 0.049/0.095/0.213/0.062 ms (tot/err) = 8/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 45596 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 37.8 GBytes 32.5 Gbits/sec root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 2 [ 4] local 1.1.1.15 port 46782 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 43237 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0000-10.0000 sec 37.5 GBytes 32.2 Gbits/sec [ 3] 0.0000-10.0000 sec 40.7 GBytes 34.9 Gbits/sec [SUM] 0.0000-10.0000 sec 78.2 GBytes 67.2 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.819/6.579/7.340/7.340 ms (tot/err) = 2/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 4 [ 4] local 1.1.1.15 port 60385 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 55203 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 35084 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 37253 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec [ 4] 0.0000-10.0000 sec 28.3 GBytes 24.3 Gbits/sec [ 7] 0.0000-10.0000 sec 28.4 GBytes 24.4 Gbits/sec [ 3] 0.0000-10.0001 sec 28.2 GBytes 24.3 Gbits/sec [SUM] 0.0000-10.0001 sec 113 GBytes 97.3 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.311/7.579/10.019/4.165 ms (tot/err) = 4/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 8 [ 8] local 1.1.1.15 port 33684 connected with 1.1.1.16 port 5001 [ 10] local 1.1.1.15 port 40620 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 56988 connected with 1.1.1.16 port 5001 [ 4] local 1.1.1.15 port 51139 connected with 1.1.1.16 port 5001 [ 12] local 1.1.1.15 port 44712 connected with 1.1.1.16 port 5001 [ 5] local 1.1.1.15 port 50838 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 51334 connected with 1.1.1.16 port 5001 [ 9] local 1.1.1.15 port 40611 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 5] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec [ 12] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec [ 10] 0.0000-10.0001 sec 13.9 GBytes 11.9 Gbits/sec [ 9] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 6] 0.0000-10.0000 sec 13.9 GBytes 11.9 Gbits/sec [ 8] 0.0000-10.0000 sec 13.8 GBytes 11.9 Gbits/sec [ 4] 0.0000-10.0001 sec 13.8 GBytes 11.9 Gbits/sec [SUM] 0.0000-10.0001 sec 111 GBytes 95.1 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.973/10.699/15.943/4.251 ms (tot/err) = 8/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 1 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 36960 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0000-10.0000 sec 41.1 GBytes 35.3 Gbits/sec root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 2 [ 3] local 1.1.1.15 port 32799 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 4] local 1.1.1.15 port 35912 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 4] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec [ 3] 0.0000-10.0000 sec 36.6 GBytes 31.4 Gbits/sec [SUM] 0.0000-10.0000 sec 73.2 GBytes 62.9 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.172/5.842/6.512/6.512 ms (tot/err) = 2/0 root@x4-right:~# LD_PRELOAD=/usr/lib/x86_64-linux-gnu/rsocket/librspreload.so iperf -c 1.1.1.16 -P 4 [ 4] local 1.1.1.15 port 53311 connected with 1.1.1.16 port 5001 ------------------------------------------------------------ Client connecting to 1.1.1.16, TCP port 5001 TCP window size: 128 KByte (default) ------------------------------------------------------------ [ 3] local 1.1.1.15 port 37243 connected with 1.1.1.16 port 5001 [ 7] local 1.1.1.15 port 60801 connected with 1.1.1.16 port 5001 [ 6] local 1.1.1.15 port 49694 connected with 1.1.1.16 port 5001 [ ID] Interval Transfer Bandwidth [ 6] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [ 7] 0.0000-10.0000 sec 28.2 GBytes 24.3 Gbits/sec [ 3] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [ 4] 0.0000-10.0000 sec 28.2 GBytes 24.2 Gbits/sec [SUM] 0.0000-10.0000 sec 113 GBytes 96.9 Gbits/sec [ CT] final connect times (min/avg/max/stdev) = 5.570/7.762/10.045/4.265 ms (tot/err) = 4/0 root@x4-right:~# > > Chuan, please check the whole thread discussion, it may help to understand > what we are looking for on rdma migrations [1]. Meanwhile please feel free > to sync with Jinpu's team and see how to move forward with such a project. We are happy to work with community to improve rdma migration. > > [1] https://lore.kernel.org/qemu-devel/87frwatp7n.fsf@xxxxxxx/ > > Thanks, Regards! > > -- > Peter Xu > _______________________________________________ Devel mailing list -- devel@xxxxxxxxxxxxxxxxx To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxx