> -----Original Message----- > From: Peter Xu [mailto:peterx@xxxxxxxxxx] > Sent: Wednesday, June 5, 2024 10:19 PM > To: Gonglei (Arei) <arei.gonglei@xxxxxxxxxx> > Cc: qemu-devel@xxxxxxxxxx; yu.zhang@xxxxxxxxx; mgalaxy@xxxxxxxxxx; > elmar.gerdes@xxxxxxxxx; zhengchuan <zhengchuan@xxxxxxxxxx>; > berrange@xxxxxxxxxx; armbru@xxxxxxxxxx; lizhijian@xxxxxxxxxxx; > pbonzini@xxxxxxxxxx; mst@xxxxxxxxxx; Xiexiangyou > <xiexiangyou@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; lixiao (H) > <lixiao91@xxxxxxxxxx>; jinpu.wang@xxxxxxxxx; Wangjialin > <wangjialin23@xxxxxxxxxx>; Fabiano Rosas <farosas@xxxxxxx> > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API > > On Wed, Jun 05, 2024 at 10:09:43AM +0000, Gonglei (Arei) wrote: > > Hi Peter, > > > > > -----Original Message----- > > > From: Peter Xu [mailto:peterx@xxxxxxxxxx] > > > Sent: Wednesday, June 5, 2024 3:32 AM > > > To: Gonglei (Arei) <arei.gonglei@xxxxxxxxxx> > > > Cc: qemu-devel@xxxxxxxxxx; yu.zhang@xxxxxxxxx; > mgalaxy@xxxxxxxxxx; > > > elmar.gerdes@xxxxxxxxx; zhengchuan <zhengchuan@xxxxxxxxxx>; > > > berrange@xxxxxxxxxx; armbru@xxxxxxxxxx; lizhijian@xxxxxxxxxxx; > > > pbonzini@xxxxxxxxxx; mst@xxxxxxxxxx; Xiexiangyou > > > <xiexiangyou@xxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; lixiao (H) > > > <lixiao91@xxxxxxxxxx>; jinpu.wang@xxxxxxxxx; Wangjialin > > > <wangjialin23@xxxxxxxxxx>; Fabiano Rosas <farosas@xxxxxxx> > > > Subject: Re: [PATCH 0/6] refactor RDMA live migration based on > > > rsocket API > > > > > > Hi, Lei, Jialin, > > > > > > Thanks a lot for working on this! > > > > > > I think we'll need to wait a bit on feedbacks from Jinpu and his > > > team on RDMA side, also Daniel for iochannels. Also, please > > > remember to copy Fabiano Rosas in any relevant future posts. We'd > > > also like to know whether he has any comments too. I have him copied in > this reply. > > > > > > On Tue, Jun 04, 2024 at 08:14:06PM +0800, Gonglei wrote: > > > > From: Jialin Wang <wangjialin23@xxxxxxxxxx> > > > > > > > > Hi, > > > > > > > > This patch series attempts to refactor RDMA live migration by > > > > introducing a new QIOChannelRDMA class based on the rsocket API. > > > > > > > > The /usr/include/rdma/rsocket.h provides a higher level rsocket > > > > API that is a 1-1 match of the normal kernel 'sockets' API, which > > > > hides the detail of rdma protocol into rsocket and allows us to > > > > add support for some modern features like multifd more easily. > > > > > > > > Here is the previous discussion on refactoring RDMA live migration > > > > using the rsocket API: > > > > > > > > https://lore.kernel.org/qemu-devel/20240328130255.52257-1-philmd@l > > > > inar > > > > o.org/ > > > > > > > > We have encountered some bugs when using rsocket and plan to > > > > submit them to the rdma-core community. > > > > > > > > In addition, the use of rsocket makes our programming more > > > > convenient, but it must be noted that this method introduces > > > > multiple memory copies, which can be imagined that there will be a > > > > certain performance degradation, hoping that friends with RDMA > > > > network cards can help verify, > > > thank you! > > > > > > It'll be good to elaborate if you tested it in-house. What people > > > should expect on the numbers exactly? Is that okay from Huawei's POV? > > > > > > Besides that, the code looks pretty good at a first glance to me. > > > Before others chim in, here're some high level comments.. > > > > > > Firstly, can we avoid using coroutine when listen()? Might be > > > relevant when I see that rdma_accept_incoming_migration() runs in a > > > loop to do raccept(), but would that also hang the qemu main loop > > > even with the coroutine, before all channels are ready? I'm not a > > > coroutine person, but I think the hope is that we can make dest QEMU > > > run in a thread in the future just like the src QEMU, so the less coroutine > the better in this path. > > > > > > > Because rsocket is set to non-blocking, raccept will return EAGAIN > > when no connection is received, coroutine will yield, and will not hang the > qemu main loop. > > Ah that's ok. And also I just noticed it may not be a big deal either as long as > we're before migration_incoming_process(). > > I'm wondering whether it can do it similarly like what we do with sockets in > qio_net_listener_set_client_func_full(). After all, rsocket wants to mimic the > socket API. It'll make sense if rsocket code tries to match with socket, or > even reuse. > Actually we tried this solution, but it didn't work. Pls see patch 3/6 Known limitations: For a blocking rsocket fd, if we use io_create_watch to wait for POLLIN or POLLOUT events, since the rsocket fd is blocking, we cannot determine when it is not ready to read/write as we can with non-blocking fds. Therefore, when an event occurs, it will occurs always, potentially leave the qemu hanging. So we need be cautious to avoid hanging when using io_create_watch . Regards, -Gonglei > > > > > I think I also left a comment elsewhere on whether it would be > > > possible to allow iochannels implement their own poll() functions to > > > avoid the per-channel poll thread that is proposed in this series. > > > > > > https://lore.kernel.org/r/ZldY21xVExtiMddB@x1n > > > > > > > We noticed that, and it's a big operation. I'm not sure that's a better way. > > > > > Personally I think even with the thread proposal it's better than > > > the old rdma code, but I just still want to double check with you > > > guys. E.g., maybe that just won't work at all? Again, that'll also > > > be based on the fact that we move migration incoming into a thread > > > first to keep the dest QEMU main loop intact, I think, but I hope we > > > will reach that irrelevant of rdma, IOW it'll be nice to happen even earlier if > possible. > > > > > Yep. This is a fairly big change, I wonder what other people's suggestions > are? > > Yes we can wait for others' opinions. And btw I'm not asking for it and I don't > think it'll be a blocker for this approach to land, as I said this is better than the > current code so it's definitely an improvement to me. > > I'm purely curious, because if you're not going to do it for rdma, maybe > someday I'll try to do that, and I want to know what "big change" could be as I > didn't dig further. It may help me by sharing what issues you've found. > > Thanks, > > -- > Peter Xu