Re: [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jun 19, 2016 at 10:14:09PM +0800, Wei Wang wrote:
> We introduce the vhost-pci design in the virtio specification format.
> To follow the naming conventions in the virtio specification, we call
> the VM who sends packets to the destination VM the device VM, and the
> VM who provides the vring and receives packets the driver VM.
> 
> Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx>
> ---
>  vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch

Adding Marc-André on CC because vhost-pci has a natural parallel to
vhost-user.  Instead of terminating the virtio device in a host
userspace process it terminates the device in a VM.  The design lessons
from vhost-user still apply though.

Marc-André: Do you have time to review this proposal?

> diff --git a/vhost-pci.patch b/vhost-pci.patch
> new file mode 100755
> index 0000000..341ba07
> --- /dev/null
> +++ b/vhost-pci.patch
> @@ -0,0 +1,341 @@
> +1. Vhost-pci Device
> +
> +1.1 Device ID
> +TBD
> +
> +1.2 Virtqueues
> +0 control receiveq
> +1 control transmitq
> +
> +1.3 Feature Bits
> +
> +1.3.1 Local Feature Bits
> +Currently no local feature bits are defined, so the standard virtio feature
> +bits negation will always be successful and complete.
> +
> +1.3.2 Remote Feature Bits
> +The remote feature bits are obtained from the frontend device and negotiated
> +with the vhost-pci driver via the control transmitq. The negotiation steps
> +are described in 1.5 Device Initialization.
> +
> +1.4 Device Configuration Layout
> +None currently defined
> +
> +1.5 Device Initialization
> +When a device VM boots, it creates a vhost-pci server socket.
> +
> +When a virtio device on the driver VM is created with specifying the use of
> +a vhost-pci device as a backend, a client socket is created and connected to
> +the server for message exchanges.
> +
> +The server and client communicate via socket messages. The server and the
> +vhost-pci driver communicate via controlq messages. The server updates the
> +driver via a control transmitq. The driver acknowledges the server via a
> +control receiveq.
> +
> +Both the socket message and controlq message headers can be constructed using
> +the following message info structure:
> +struct vhost_pci_msg_info {
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
> +	u16 msg_type;
> +	u16 msg_version;
> +	u32 msg_len;
> +	u64 msg_seq;
> +};
> +The msg_seq field stores the message sequence number. Each client maintains
> +its own message sequence number.
> +
> +The socket messages are preceded by the following header:
> +struct vhost_pci_socket_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +	u64 client_uuid;
> +};
> +The client_uuid field is generated by the client for the client identification
> +purpose.
> +
> +The controlq messages are preceded by the following header:
> +struct vhost_pci_controlq_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +#define VHOST_PCI_FRONTEND_DEVICE_NET 1
> +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
> +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
> +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
> +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
> +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
> +	u32 device_type;
> +	u64 device_id;
> +};
> +The device_type and device_id fields identify the frontend device (client).
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
> +constructed using the following structure:
> +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_socket_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16 ops;
> +	u32 nregions;
> +	struct vhost_pci_memory_region {
> +		int fd;
> +		u64 guest_phys_addr;
> +		u64 memory_size;
> +		u64 mmap_offset;
> +	} regions[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
> +constructed using the following structure:
> +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_controlq_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16  ops;
> +	u32 nregion;
> +	struct exotic_memory_region {
> +		u64   region_base_xgpa;
> +		u64   size;
> +		u64   offset_in_bar_area;
> +	} region[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
> +using the following vhost_pci_device_info structure and
> +the vhost_pci_feature_bits structure respectively.
> +
> +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
> +struct vhost_pci_device_info {
> +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
> +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
> +	u16    ops;
> +	u32    nvirtq;
> +	u32    device_type;
> +	u64    device_id;
> +	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
> +};
> +
> +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
> +struct vhost_pci_feature_bits {
> +	u64 feature_bits;
> +};
> +
> +The payload of all the ACK socket/controlq messages can be constructed using
> +the following structure:
> +/* socket/controlq message: ACK messages */
> +struct vhost_pci_ack {
> +	union ack_msg {
> +#define VHOST_PCI_ACK_ADD_DONE 0
> +#define VHOST_PCI_ACK_ADD_FAIL 1
> +#define VHOST_PCI_ACK_DEL_DONE 2
> +#define VHOST_PCI_ACK_DEL_FAIL 3
> +	u64 ack_memory_info;		
> +	u64 ack_device_info;
> +	u64 ack_feature_bits;
> +	};
> +};
> +
> +1.5.1 Device Requirements: Device Initialization
> +
> +1.5.1.1	The Frontend Device (Client)
> +The vhost-pci server socket path SHOULD be provided to a virtio client socket
> +for the connection.
> +
> +The client SHOULD send three socket messages,
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
> +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
> +to the server, and wait until receiving the corresponding three ACK
> +messages from the server.
> +
> +The client may receive the following ACK socket messages from the server:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully mapped the memory, and a vhost-pci device is created on
> +the device VM for the driver VM.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to map the memory. Receiving this message results in the failure of
> +setting up the vhost-pci based inter-VM communication support for the driver
> +VM.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully initialized the related interfaces to communicate to the
> +fronted device.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to  initialize the related interfaces to communicate to the fronted
> +device.
> +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
> +this message contains the feature bits accepted by the vhost-pci device and
> +driver. If the accepted feature bits are not equal to the feature bits sent by
> +the client, the client MUST reset the device to go into backwards capability
> +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
> +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
> +message to the vhost-pci server. Otherwise, no
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
> +sent back to the server.
> +
> +1.5.1.2	The Vhost-pci Device (Server)
> +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
> +be created when it boots.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
> +socket message, it SHOULD check if a vhost-pci device has been created for the
> +requesting VM. If the client_uuid contained in the socket message is not new
> +to the server, the server SHOULD simply update the received message to the
> +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
> +create a new vhost-pci device, and continue the following memory mapping
> +related initialization steps.
> +
> +The vhost-pci server SHOULD add up all the memory region size, and use a
> +64-bit device bar for the mapping of all the memory regions obtained from the
> +socket message. To better support the driver VM to hot-plug memory, the bar
> +SHOULD be configured with a double size of the driver VM's memory. The server
> +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
> +then the new created vhost-pci device SHOULD be hot-plugged to the VM.
> +
> +When the device status is updated with DRIVER_OK, a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
> +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
> +may receive the following ACK messages from the driver via the control
> +receiveq:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the memory info to its support. The server SHOULD send
> +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the memory info to its support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
> +
> +When the vhost-pci server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
> +and inject a controlq interrupt to the VM. When the server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
> +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
> +feature bits) socket message to the client. If the accepted feature bits sent
> +to the client does not equal to the one that it received, the server SHOULD
> +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
> +bits) socket message from the client, which indicates that the frontend device
> +has finished the re-negotiation of the accepted feature bits.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
> +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following ACK messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
> +vhost-pci driver has successfully added the frontend device to its support
> +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
> +socket message to the corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
> +vhost-pci driver fails to add the frontend device to its support list. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
> +message to the corresponding client.
> +
> +1.5.2 Driver Requirements: Device Initialization
> +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
> +receiveq if it succeeds to handle the received controlq message or not.
> +The vhost-pci driver MUST NOT accept any feature bits that are not offered
> +by the remote feature bits.
> +
> +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
> +controlq message, it MUST initialize the corresponding driver interfaces of
> +the device type if they are not initialized, and add the device id to the
> +support list that records all the frontend devices being supported by
> +vhost-pci for inter-VM communications.
> +
> +1.6 Device Operation
> +1.6.1 Device Requirements: Device Operation
> +1.6.1.1 The Frontend Device (Client)
> +When the frontend device changes any info (e.g. device_id, virtq address)
> +that it has sent to the vhost-pci device, it MUST send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
> +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
> +message to the control transmitq, and inject a controlq interrupt to the VM.
> +
> +When the frontend virtio device is removed (e.g. being hot-plugged out), the
> +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
> +server.
> +
> +Before the driver VM is destroyed or migrated, all the clients that connect to
> +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
> +the server. The destroying or migrating activity MUST wait until all the
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
> +
> +When the driver VM hot-adds or hot-removes memory, it SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
> +
> +4.6.1.2 The Vhost-pci Device (Server)
> +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
> +and inject a controlq interrupt to the VM. It may receive the following ACK
> +controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully updated the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
> +fails to update the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
> +corresponding client.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully removed the vhost-pci support for the frontend device. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
> +message to the corresponding client.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to remove the vhost-pci support for the frontend device. The server
> +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
> +the corresponding client.
> +
> +When there is no client of a driver VM connecting to the vhost-pci device,
> +the server SHOULD destroy the vhost-pci device for that driver VM.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
> +it calculates the total size of the received memory. If the new memory size
> +plus the mapped memory size is smaller than the address space size reserved by
> +the bar, the server SHOULD map the new memory and expose it to the VM via the
> +QEMU MemoryRegion mechanism. Then it SHOULD put a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
> +and inject a controlq interrupt to the VM.
> +
> +If the new memory size plus the mapped memory size is larger than the address
> +space size reserved by the bar, the server SHOULD
> +1. clone out a new vhost-pci device;
> +2. configure the bar size to be double of the current memory size; and 
> +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
> +device to the VM.
> +
> +The initialization steps SHOULD follow 1.5 Device Initialization, except the
> +interaction messages between the server and client are not needed.
> +
> +The server may receive the following two memory info add related ACK controlq
> +messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
> +client.
> +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
> +client.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
> +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following two memory ACK controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully deleted the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to delete the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
> +client.
> +
> +1.6.2 Driver Requirements: Device Operation
> +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
> +controlq messages before acknowledging the server.
> -- 
> 1.8.3.1
> 

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux