[It may be necessary to remove virtio-dev@xxxxxxxxxxxxxxxxxxxx from CC if you are a non-TC member.] Hi, Some modern networking applications bypass the kernel network stack so that rx/tx rings and DMA buffers can be directly mapped. This is typical in DPDK applications where virtio-net currently is one of several NIC choices. Existing virtio-net implementations are not optimized for VM-to-VM DPDK-style networking. The following outline describes a zero-copy virtio-net solution for VM-to-VM networking. Thanks to Paolo Bonzini for the Shared Buffers BAR idea. Use case -------- Two VMs on the same host need to communicate in the most efficient manner possible (e.g. the sole purpose of the VMs is to do network I/O). Applications running inside the VMs implement virtio-net in userspace so they have full control over rx/tx rings and data buffer placement. Performance requirements are higher priority than security or isolation. If this bothers you, stick to classic virtio-net. virtio-net VM-to-VM extensions ------------------------------ A few extensions to virtio-net are necessary to support zero-copy VM-to-VM communication. The extensions are covered informally throughout the text, this is not a VIRTIO specification change proposal. The VM-to-VM capable virtio-net PCI adapter has an additional MMIO BAR called the Shared Buffers BAR. The Shared Buffers BAR is a shared memory region on the host so that the virtio-net devices in VM1 and VM2 both access the same region of memory. The vring is still allocated in guest RAM as usual but data buffers must be located in the Shared Buffers BAR in order to take advantage of zero-copy. When VM1 places a packet into the tx queue and the buffers are located in the Shared Buffers BAR, the host finds the VM2's rx queue descriptor with the same buffer address and completes it without copying any data buffers. Shared buffer allocation ------------------------ A simple scheme for two cooperating VMs to manage the Shared Buffers BAR is as follows: VM1 VM2 +---+ rx->| 1 |<-tx +---+ tx->| 2 |<-rx +---+ Shared Buffers This is a trivial example where the Shared Buffers BAR has only two packet buffers. VM1 starts by putting buffer 1 in its rx queue. VM2 starts by putting buffer 2 in its rx queue. The VMs know which buffers to choose based on a new uint8_t virtio_net_config.shared_buffers_offset field (0 for VM1 and 1 for VM2). VM1 can transmit to VM2 by filling buffer 2 and placing it on its tx queue. VM2 can transmit by filling buffer 1 and placing it on its tx queue. As soon as a buffer is placed on a tx queue, the VM passes ownership of the buffer to the other VM. In other words, the buffer must not be touched even after virtio-net tx completion because it now belongs to the other VM. This scheme of bouncing ownership back-and-forth between the two VMs only works if both VMs transmit an equal number of buffers over time. In reality the traffic pattern may be unbalanced so VM1 is always transmitting and VM2 is always receiving. This problem can be overcome if the VMs cooperate and return buffers if they accumulate too many. For example, after VM1 transmits buffer 2 it has run out of tx buffers: VM1 VM2 +---+ rx->| 1 |<-tx +---+ X->| 2 |<-rx +---+ VM2 notices that it now holds all buffers. It can donate a buffer back to VM1 by putting it on the tx queue with the new virtio_net_hdr.flags VIRTIO_NET_HDR_F_GIFT_BUFFER flag. This flag indicates that this is not a packet but rather an empty gifted buffer. VM1 checks the flags field to detect that it has been gifted buffers. Also note that zero-copy networking is not mutually exclusive with classic virtio-net. If the descriptor has buffer addresses outside the Shared Buffers BAR, then classic non-zero-copy virtio-net behavior occurs. Host-side implementation ------------------------ The host facilitates zero-copy VM-to-VM communication by taking descriptors off tx queues and filling in rx descriptors of the paired VM. In the Linux vhost_net implementation this could work as follows: 1. VM1 places buffer 2 on the tx queue and kicks the host. Ownership of the buffer no longer belongs to VM1. 2. vhost_net pops the buffer from VM1's tx queue and verifies that the buffer address is within the Shared Buffers BAR. 3. vhost_net finds the VM2 rx queue descriptor whose buffer address matches, completes that descriptor, and kicks VM2. 4. VM2 pops buffer 2 from the rx queue. It can now reuse this buffer for transmitting to VM1. The vhost_net.ko kernel module needs a new ioctl for pairing vhost_net instances. This ioctl is used to establish the VM-to-VM connection between VM1's virtio-net and VM2's virtio-net. Discussion ---------- The result is that applications in separate VMs can communicate in true zero-copy fashion. I think this approach could be fruitful in bringing virtio-net to VM-to-VM networking use cases. Unless virtio-net is extended for this use case, I'm afraid DPDK and OpenDataPlane communities might steer clear of VIRTIO. This is an idea I want to share but I'm not working on a prototype. Feel free to flesh it out further and try it! Open issues: * Multiple VMs? * Multiqueue? * Choice of shared buffer allocation algorithm? * etc Stefan
Attachment:
pgpAhUncXzFug.pgp
Description: PGP signature
_______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization