Lei, On Wed, May 29, 2024 at 02:43:46AM +0000, Gonglei (Arei) wrote: > For rdma programming, the current mainstream implementation is to use > rdma_cm to establish a connection, and then use verbs to transmit data. > rdma_cm and ibverbs create two FDs respectively. The two FDs have > different responsibilities. rdma_cm fd is used to notify connection > establishment events, and verbs fd is used to notify new CQEs. When > poll/epoll monitoring is directly performed on the rdma_cm fd, only a > pollin event can be monitored, which means that an rdma_cm event > occurs. When the verbs fd is directly polled/epolled, only the pollin > event can be listened, which indicates that a new CQE is generated. > > Rsocket is a sub-module attached to the rdma_cm library and provides > rdma calls that are completely similar to socket interfaces. However, > this library returns only the rdma_cm fd for listening to link > setup-related events and does not expose the verbs fd (readable and > writable events for listening to data). Only the rpoll interface provided > by the RSocket can be used to listen to related events. However, QEMU > uses the ppoll interface to listen to the rdma_cm fd (gotten by raccept > API). And cannot listen to the verbs fd event. Only some hacking methods > can be used to address this problem. Do you guys have any ideas? Thanks. I saw that you mentioned this elsewhere: > Right. But the question is QEMU do not use rpoll but gilb's ppoll. :( So what I'm thinking may not make much sense, as I mentioned I don't think I know rdma at all.. and my idea also has involvement on coroutine stuff which I also don't know well. But just in case it shed some light in some form. IIUC we do iochannel blockings with this no matter for read/write: if (len == QIO_CHANNEL_ERR_BLOCK) { if (qemu_in_coroutine()) { qio_channel_yield(ioc, G_IO_XXX); } else { qio_channel_wait(ioc, G_IO_XXX); } continue; } One thing I'm wondering is whether we can provide a new feature bit for qiochannel, e.g., QIO_CHANNEL_FEATURE_POLL, so that the iochannel can define its own poll routine rather than using the default when possible. I think it may not work if it's in a coroutine, as I guess that'll block other fds from being waked up. Hence it should look like this: if (len == QIO_CHANNEL_ERR_BLOCK) { if (qemu_in_coroutine()) { qio_channel_yield(ioc, G_IO_XXX); } else if (qio_channel_has_feature(ioc, QIO_CHANNEL_FEATURE_POLL)) { qio_channel_poll(ioc, G_IO_XXX); } else { qio_channel_wait(ioc, G_IO_XXX); } continue; } Maybe we even want to forbid such channel to be used in coroutine already, as when QIO_CHANNEL_FEATURE_POLL set it may mean that this iochannel simply won't work with poll() like in rdma's use case. Then rdma iochannel can implement qio_channel_poll() using rpoll(). There's one other dependent issue here in that I _think_ the migration recv side is still in a coroutine.. so we may need to move that into a thread first. IIRC we don't yet have a major blocker to do that, but I didn't further check either. I've put that issue aside just to see whether this may or may not make sense. Thanks, -- Peter Xu