We introduce the vhost-pci design in the virtio specification format. To follow the naming conventions in the virtio specification, we call the VM who sends packets to the destination VM the device VM, and the VM who provides the vring and receives packets the driver VM. Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx> --- vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 341 insertions(+) create mode 100755 vhost-pci.patch diff --git a/vhost-pci.patch b/vhost-pci.patch new file mode 100755 index 0000000..341ba07 --- /dev/null +++ b/vhost-pci.patch @@ -0,0 +1,341 @@ +1. Vhost-pci Device + +1.1 Device ID +TBD + +1.2 Virtqueues +0 control receiveq +1 control transmitq + +1.3 Feature Bits + +1.3.1 Local Feature Bits +Currently no local feature bits are defined, so the standard virtio feature +bits negation will always be successful and complete. + +1.3.2 Remote Feature Bits +The remote feature bits are obtained from the frontend device and negotiated +with the vhost-pci driver via the control transmitq. The negotiation steps +are described in 1.5 Device Initialization. + +1.4 Device Configuration Layout +None currently defined + +1.5 Device Initialization +When a device VM boots, it creates a vhost-pci server socket. + +When a virtio device on the driver VM is created with specifying the use of +a vhost-pci device as a backend, a client socket is created and connected to +the server for message exchanges. + +The server and client communicate via socket messages. The server and the +vhost-pci driver communicate via controlq messages. The server updates the +driver via a control transmitq. The driver acknowledges the server via a +control receiveq. + +Both the socket message and controlq message headers can be constructed using +the following message info structure: +struct vhost_pci_msg_info { +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0 +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1 +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2 +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3 +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4 +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5 + u16 msg_type; + u16 msg_version; + u32 msg_len; + u64 msg_seq; +}; +The msg_seq field stores the message sequence number. Each client maintains +its own message sequence number. + +The socket messages are preceded by the following header: +struct vhost_pci_socket_hdr { + struct vhost_pci_msg_info msg_info; + u64 client_uuid; +}; +The client_uuid field is generated by the client for the client identification +purpose. + +The controlq messages are preceded by the following header: +struct vhost_pci_controlq_hdr { + struct vhost_pci_msg_info msg_info; +#define VHOST_PCI_FRONTEND_DEVICE_NET 1 +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2 +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 + u32 device_type; + u64 device_id; +}; +The device_type and device_id fields identify the frontend device (client). + +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be +constructed using the following structure: +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */ +struct vhost_pci_socket_memory_info { +#define VHOST_PCI_ADD_MEMORY 0 +#define VHOST_PCI_DEL_MEMORY 1 + u16 ops; + u32 nregions; + struct vhost_pci_memory_region { + int fd; + u64 guest_phys_addr; + u64 memory_size; + u64 mmap_offset; + } regions[VHOST_PCI_MAX_NREGIONS]; +}; + +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be +constructed using the following structure: +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */ +struct vhost_pci_controlq_memory_info { +#define VHOST_PCI_ADD_MEMORY 0 +#define VHOST_PCI_DEL_MEMORY 1 + u16 ops; + u32 nregion; + struct exotic_memory_region { + u64 region_base_xgpa; + u64 size; + u64 offset_in_bar_area; + } region[VHOST_PCI_MAX_NREGIONS]; +}; + +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed +using the following vhost_pci_device_info structure and +the vhost_pci_feature_bits structure respectively. + +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */ +struct vhost_pci_device_info { +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0 +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1 + u16 ops; + u32 nvirtq; + u32 device_type; + u64 device_id; + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; +}; + +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */ +struct vhost_pci_feature_bits { + u64 feature_bits; +}; + +The payload of all the ACK socket/controlq messages can be constructed using +the following structure: +/* socket/controlq message: ACK messages */ +struct vhost_pci_ack { + union ack_msg { +#define VHOST_PCI_ACK_ADD_DONE 0 +#define VHOST_PCI_ACK_ADD_FAIL 1 +#define VHOST_PCI_ACK_DEL_DONE 2 +#define VHOST_PCI_ACK_DEL_FAIL 3 + u64 ack_memory_info; + u64 ack_device_info; + u64 ack_feature_bits; + }; +}; + +1.5.1 Device Requirements: Device Initialization + +1.5.1.1 The Frontend Device (Client) +The vhost-pci server socket path SHOULD be provided to a virtio client socket +for the connection. + +The client SHOULD send three socket messages, +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD), +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits) +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD), +to the server, and wait until receiving the corresponding three ACK +messages from the server. + +The client may receive the following ACK socket messages from the server: +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device +VM has successfully mapped the memory, and a vhost-pci device is created on +the device VM for the driver VM. +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device +VM fails to map the memory. Receiving this message results in the failure of +setting up the vhost-pci based inter-VM communication support for the driver +VM. +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device +VM has successfully initialized the related interfaces to communicate to the +fronted device. +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device +VM fails to initialize the related interfaces to communicate to the fronted +device. +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of +this message contains the feature bits accepted by the vhost-pci device and +driver. If the accepted feature bits are not equal to the feature bits sent by +the client, the client MUST reset the device to go into backwards capability +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket +message to the vhost-pci server. Otherwise, no +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is +sent back to the server. + +1.5.1.2 The Vhost-pci Device (Server) +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST +be created when it boots. + +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) +socket message, it SHOULD check if a vhost-pci device has been created for the +requesting VM. If the client_uuid contained in the socket message is not new +to the server, the server SHOULD simply update the received message to the +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD +create a new vhost-pci device, and continue the following memory mapping +related initialization steps. + +The vhost-pci server SHOULD add up all the memory region size, and use a +64-bit device bar for the mapping of all the memory regions obtained from the +socket message. To better support the driver VM to hot-plug memory, the bar +SHOULD be configured with a double size of the driver VM's memory. The server +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and +then the new created vhost-pci device SHOULD be hot-plugged to the VM. + +When the device status is updated with DRIVER_OK, a +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server +may receive the following ACK messages from the driver via the control +receiveq: +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver +has successfully added the memory info to its support. The server SHOULD send +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client. +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver +fails to add the memory info to its support. The server SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client. + +When the vhost-pci server receives a +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq, +and inject a controlq interrupt to the VM. When the server receives a +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted +feature bits) socket message to the client. If the accepted feature bits sent +to the client does not equal to the one that it received, the server SHOULD +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature +bits) socket message from the client, which indicates that the frontend device +has finished the re-negotiation of the accepted feature bits. + +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the +control transmitq, and inject a controlq interrupt to the VM. The server may +receive the following ACK messages from the driver: +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the +vhost-pci driver has successfully added the frontend device to its support +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) +socket message to the corresponding client. +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the +vhost-pci driver fails to add the frontend device to its support list. The +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket +message to the corresponding client. + +1.5.2 Driver Requirements: Device Initialization +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control +receiveq if it succeeds to handle the received controlq message or not. +The vhost-pci driver MUST NOT accept any feature bits that are not offered +by the remote feature bits. + +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) +controlq message, it MUST initialize the corresponding driver interfaces of +the device type if they are not initialized, and add the device id to the +support list that records all the frontend devices being supported by +vhost-pci for inter-VM communications. + +1.6 Device Operation +1.6.1 Device Requirements: Device Operation +1.6.1.1 The Frontend Device (Client) +When the frontend device changes any info (e.g. device_id, virtq address) +that it has sent to the vhost-pci device, it MUST send a +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq +message to the control transmitq, and inject a controlq interrupt to the VM. + +When the frontend virtio device is removed (e.g. being hot-plugged out), the +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the +server. + +Before the driver VM is destroyed or migrated, all the clients that connect to +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to +the server. The destroying or migrating activity MUST wait until all the +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received. + +When the driver VM hot-adds or hot-removes memory, it SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server. + +4.6.1.2 The Vhost-pci Device (Server) +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq, +and inject a controlq interrupt to the VM. It may receive the following ACK +controlq messages from the driver: +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver +has successfully updated the device info. The server SHOULD send a +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the +corresponding client. +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver +fails to update the device info. The server SHOULD send a +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the +corresponding client. +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver +has successfully removed the vhost-pci support for the frontend device. The +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket +message to the corresponding client. +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver +fails to remove the vhost-pci support for the frontend device. The server +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to +the corresponding client. + +When there is no client of a driver VM connecting to the vhost-pci device, +the server SHOULD destroy the vhost-pci device for that driver VM. + +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message, +it calculates the total size of the received memory. If the new memory size +plus the mapped memory size is smaller than the address space size reserved by +the bar, the server SHOULD map the new memory and expose it to the VM via the +QEMU MemoryRegion mechanism. Then it SHOULD put a +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq, +and inject a controlq interrupt to the VM. + +If the new memory size plus the mapped memory size is larger than the address +space size reserved by the bar, the server SHOULD +1. clone out a new vhost-pci device; +2. configure the bar size to be double of the current memory size; and +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci +device to the VM. + +The initialization steps SHOULD follow 1.5 Device Initialization, except the +interaction messages between the server and client are not needed. + +The server may receive the following two memory info add related ACK controlq +messages from the driver: +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver +has successfully added the new memory info support. The server SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding +client. +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver +fails to add the new memory info support. The server SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding +client. + +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message, +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the +control transmitq, and inject a controlq interrupt to the VM. The server may +receive the following two memory ACK controlq messages from the driver: +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver +has successfully deleted the memory info support. The server SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE) socket message to the +corresponding client. +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver +fails to delete the memory info support. The server SHOULD send a +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding +client. + +1.6.2 Driver Requirements: Device Operation +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) +controlq messages before acknowledging the server. -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html