This series introduces a framework, which can be used to implement vDPA Devices in a userspace program. To implement it, the work consist of two parts: control path emulating and data path offloading. In the control path, the VDUSE driver will make use of message mechnism to forward the actions (get/set features, get/st status, get/set config space and set virtqueue states) from virtio-vdpa driver to userspace. Userspace can use read()/write() to receive/reply to those control messages. In the data path, the VDUSE driver implements a MMU-based on-chip IOMMU driver which supports both direct mapping and indirect mapping with bounce buffer. Then userspace can access those iova space via mmap(). Besides, eventfd mechnism is used to trigger interrupts and forward virtqueue kicks. The details and our user case is shown below: ------------------------ ----------------------------------------------------------- | APP | | QEMU | | --------- | | -------------------- -------------------+<-->+------ | | |dev/vdx| | | | device emulation | | virtio dataplane | | BDS | | ------------+----------- -----------+-----------------------+-----------------+----- | | | | | | emulating | offloading | ------------+---------------------------+-----------------------+-----------------+------ | | block device | | vduse driver | | vdpa device | | TCP/IP | | | -------+-------- --------+-------- +------+------- -----+---- | | | | | | | | | | | | | | | | ----------+---------- ----------+----------- | | | | | | virtio-blk driver | | virtio-vdpa driver | | | | | | ----------+---------- ----------+----------- | | | | | | | | | | | | | ------------------ | | | | ----------------------------------------------------- ---+--- | ------------------------------------------------------------------------------ | NIC |--- ---+--- | ---------+--------- | Remote Storages | ------------------- We make use of it to implement a block device connecting to our distributed storage, which can be used in containers and bare metal. Compared with qemu-nbd solution, this solution has higher performance, and we can have an unified technology stack in VM and containers for remote storages. To test it with a host disk (e.g. /dev/sdx): $ qemu-storage-daemon \ --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \ --monitor chardev=charmonitor \ --blockdev driver=host_device,cache.direct=on,aio=native,filename=/dev/sdx,node-name=disk0 \ --export vduse-blk,id=test,node-name=disk0,writable=on,vduse-id=1,num-queues=16,queue-size=128 The qemu-storage-daemon can be found at https://github.com/bytedance/qemu/tree/vduse Future work: - Improve performance (e.g. zero copy implementation in datapath) - Config interrupt support - Userspace library (find a way to reuse device emulation code in qemu/rust-vmm) Xie Yongji (4): mm: export zap_page_range() for driver use vduse: Introduce VDUSE - vDPA Device in Userspace vduse: grab the module's references until there is no vduse device vduse: Add memory shrinker to reclaim bounce pages drivers/vdpa/Kconfig | 8 + drivers/vdpa/Makefile | 1 + drivers/vdpa/vdpa_user/Makefile | 5 + drivers/vdpa/vdpa_user/eventfd.c | 221 ++++++ drivers/vdpa/vdpa_user/eventfd.h | 48 ++ drivers/vdpa/vdpa_user/iova_domain.c | 488 ++++++++++++ drivers/vdpa/vdpa_user/iova_domain.h | 104 +++ drivers/vdpa/vdpa_user/vduse.h | 66 ++ drivers/vdpa/vdpa_user/vduse_dev.c | 1081 ++++++++++++++++++++++++++ include/uapi/linux/vduse.h | 85 ++ mm/memory.c | 1 + 11 files changed, 2108 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/eventfd.c create mode 100644 drivers/vdpa/vdpa_user/eventfd.h create mode 100644 drivers/vdpa/vdpa_user/iova_domain.c create mode 100644 drivers/vdpa/vdpa_user/iova_domain.h create mode 100644 drivers/vdpa/vdpa_user/vduse.h create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h -- 2.25.1