On Mon, Jan 10, 2022 at 11:44 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > On Mon, Jan 10, 2022 at 11:24:40PM +0800, Yongji Xie wrote: > > On Mon, Jan 10, 2022 at 11:10 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > > > > > On Mon, Jan 10, 2022 at 09:54:08PM +0800, Yongji Xie wrote: > > > > On Mon, Jan 10, 2022 at 8:57 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > > > > > > > > > On Mon, Aug 30, 2021 at 10:17:24PM +0800, Xie Yongji wrote: > > > > > > This series introduces a framework that makes it possible to implement > > > > > > software-emulated vDPA devices in userspace. And to make the device > > > > > > emulation more secure, the emulated vDPA device's control path is handled > > > > > > in the kernel and only the data path is implemented in the userspace. > > > > > > > > > > > > Since the emuldated vDPA device's control path is handled in the kernel, > > > > > > a message mechnism is introduced to make userspace be aware of the data > > > > > > path related changes. Userspace can use read()/write() to receive/reply > > > > > > the control messages. > > > > > > > > > > > > In the data path, the core is mapping dma buffer into VDUSE daemon's > > > > > > address space, which can be implemented in different ways depending on > > > > > > the vdpa bus to which the vDPA device is attached. > > > > > > > > > > > > In virtio-vdpa case, we implements a MMU-based software IOTLB with > > > > > > bounce-buffering mechanism to achieve that. And in vhost-vdpa case, the dma > > > > > > buffer is reside in a userspace memory region which can be shared to the > > > > > > VDUSE userspace processs via transferring the shmfd. > > > > > > > > > > > > The details and our user case is shown below: > > > > > > > > > > > > ------------------------ ------------------------- ---------------------------------------------- > > > > > > | Container | | QEMU(VM) | | VDUSE daemon | > > > > > > | --------- | | ------------------- | | ------------------------- ---------------- | > > > > > > | |dev/vdx| | | |/dev/vhost-vdpa-x| | | | vDPA device emulation | | block driver | | > > > > > > ------------+----------- -----------+------------ -------------+----------------------+--------- > > > > > > | | | | > > > > > > | | | | > > > > > > ------------+---------------------------+----------------------------+----------------------+--------- > > > > > > | | block device | | vhost device | | vduse driver | | TCP/IP | | > > > > > > | -------+-------- --------+-------- -------+-------- -----+---- | > > > > > > | | | | | | > > > > > > | ----------+---------- ----------+----------- -------+------- | | > > > > > > | | virtio-blk driver | | vhost-vdpa driver | | vdpa device | | | > > > > > > | ----------+---------- ----------+----------- -------+------- | | > > > > > > | | virtio bus | | | | > > > > > > | --------+----+----------- | | | | > > > > > > | | | | | | > > > > > > | ----------+---------- | | | | > > > > > > | | virtio-blk device | | | | | > > > > > > | ----------+---------- | | | | > > > > > > | | | | | | > > > > > > | -----------+----------- | | | | > > > > > > | | virtio-vdpa driver | | | | | > > > > > > | -----------+----------- | | | | > > > > > > | | | | vdpa bus | | > > > > > > | -----------+----------------------+---------------------------+------------ | | > > > > > > | ---+--- | > > > > > > -----------------------------------------------------------------------------------------| NIC |------ > > > > > > ---+--- > > > > > > | > > > > > > ---------+--------- > > > > > > | Remote Storages | > > > > > > ------------------- > > > > > > > > > > > > We make use of it to implement a block device connecting to > > > > > > our distributed storage, which can be used both in containers and > > > > > > VMs. Thus, we can have an unified technology stack in this two cases. > > > > > > > > > > > > To test it with null-blk: > > > > > > > > > > > > $ qemu-storage-daemon \ > > > > > > --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ > > > > > > --monitor chardev=charmonitor \ > > > > > > --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0 \ > > > > > > --export type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128 > > > > > > > > > > > > The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse > > > > > > > > > > It's been half a year - any plans to upstream this? > > > > > > > > Yeah, this is on my to-do list this month. > > > > > > > > Sorry for taking so long... I've been working on another project > > > > enabling userspace RDMA with VDUSE for the past few months. So I > > > > didn't have much time for this. Anyway, I will submit the first > > > > version as soon as possible. > > > > > > > > Thanks, > > > > Yongji > > > > > > Oh fun. You mean like virtio-rdma? Or RDMA as a backend for regular > > > virtio? > > > > > > > Yes, like virtio-rdma. Then we can develop something like userspace > > rxe、siw or custom protocol with VDUSE. > > > > Thanks, > > Yongji > > Would be interesting to see the spec for that. Will send it ASAP. > The issues with RDMA revolved around the fact that current > apps tend to either use non-standard propocols for connection > establishment or use UD where there's IIRC no standard > at all. So QP numbers are hard to virtualize. > Similarly many use LIDs directly with the same effect. > GUIDs might be virtualizeable but no one went to the effort. > Actually we aimed at emulating a soft RDMA with normal NIC (not use RDMA capability) rather than virtualizing a physical RDMA NIC into several vRDMA devices. If so, I think we won't have those issues, right? > To say nothing about the interaction with memory overcommit. > I don't get you here. Could you give me more details? Thanks, Yongji