On Sun, Jun 19, 2016 at 10:14:09PM +0800, Wei Wang wrote: > We introduce the vhost-pci design in the virtio specification format. > To follow the naming conventions in the virtio specification, we call > the VM who sends packets to the destination VM the device VM, and the > VM who provides the vring and receives packets the driver VM. > > Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx> > --- > vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 341 insertions(+) > create mode 100755 vhost-pci.patch Adding Marc-André on CC because vhost-pci has a natural parallel to vhost-user. Instead of terminating the virtio device in a host userspace process it terminates the device in a VM. The design lessons from vhost-user still apply though. Marc-André: Do you have time to review this proposal? > diff --git a/vhost-pci.patch b/vhost-pci.patch > new file mode 100755 > index 0000000..341ba07 > --- /dev/null > +++ b/vhost-pci.patch > @@ -0,0 +1,341 @@ > +1. Vhost-pci Device > + > +1.1 Device ID > +TBD > + > +1.2 Virtqueues > +0 control receiveq > +1 control transmitq > + > +1.3 Feature Bits > + > +1.3.1 Local Feature Bits > +Currently no local feature bits are defined, so the standard virtio feature > +bits negation will always be successful and complete. > + > +1.3.2 Remote Feature Bits > +The remote feature bits are obtained from the frontend device and negotiated > +with the vhost-pci driver via the control transmitq. The negotiation steps > +are described in 1.5 Device Initialization. > + > +1.4 Device Configuration Layout > +None currently defined > + > +1.5 Device Initialization > +When a device VM boots, it creates a vhost-pci server socket. > + > +When a virtio device on the driver VM is created with specifying the use of > +a vhost-pci device as a backend, a client socket is created and connected to > +the server for message exchanges. > + > +The server and client communicate via socket messages. The server and the > +vhost-pci driver communicate via controlq messages. The server updates the > +driver via a control transmitq. The driver acknowledges the server via a > +control receiveq. > + > +Both the socket message and controlq message headers can be constructed using > +the following message info structure: > +struct vhost_pci_msg_info { > +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0 > +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1 > +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2 > +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3 > +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4 > +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5 > + u16 msg_type; > + u16 msg_version; > + u32 msg_len; > + u64 msg_seq; > +}; > +The msg_seq field stores the message sequence number. Each client maintains > +its own message sequence number. > + > +The socket messages are preceded by the following header: > +struct vhost_pci_socket_hdr { > + struct vhost_pci_msg_info msg_info; > + u64 client_uuid; > +}; > +The client_uuid field is generated by the client for the client identification > +purpose. > + > +The controlq messages are preceded by the following header: > +struct vhost_pci_controlq_hdr { > + struct vhost_pci_msg_info msg_info; > +#define VHOST_PCI_FRONTEND_DEVICE_NET 1 > +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2 > +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3 > +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4 > +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5 > +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8 > + u32 device_type; > + u64 device_id; > +}; > +The device_type and device_id fields identify the frontend device (client). > + > +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be > +constructed using the following structure: > +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */ > +struct vhost_pci_socket_memory_info { > +#define VHOST_PCI_ADD_MEMORY 0 > +#define VHOST_PCI_DEL_MEMORY 1 > + u16 ops; > + u32 nregions; > + struct vhost_pci_memory_region { > + int fd; > + u64 guest_phys_addr; > + u64 memory_size; > + u64 mmap_offset; > + } regions[VHOST_PCI_MAX_NREGIONS]; > +}; > + > +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be > +constructed using the following structure: > +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */ > +struct vhost_pci_controlq_memory_info { > +#define VHOST_PCI_ADD_MEMORY 0 > +#define VHOST_PCI_DEL_MEMORY 1 > + u16 ops; > + u32 nregion; > + struct exotic_memory_region { > + u64 region_base_xgpa; > + u64 size; > + u64 offset_in_bar_area; > + } region[VHOST_PCI_MAX_NREGIONS]; > +}; > + > +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and > +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed > +using the following vhost_pci_device_info structure and > +the vhost_pci_feature_bits structure respectively. > + > +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */ > +struct vhost_pci_device_info { > +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0 > +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1 > + u16 ops; > + u32 nvirtq; > + u32 device_type; > + u64 device_id; > + struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ]; > +}; > + > +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */ > +struct vhost_pci_feature_bits { > + u64 feature_bits; > +}; > + > +The payload of all the ACK socket/controlq messages can be constructed using > +the following structure: > +/* socket/controlq message: ACK messages */ > +struct vhost_pci_ack { > + union ack_msg { > +#define VHOST_PCI_ACK_ADD_DONE 0 > +#define VHOST_PCI_ACK_ADD_FAIL 1 > +#define VHOST_PCI_ACK_DEL_DONE 2 > +#define VHOST_PCI_ACK_DEL_FAIL 3 > + u64 ack_memory_info; > + u64 ack_device_info; > + u64 ack_feature_bits; > + }; > +}; > + > +1.5.1 Device Requirements: Device Initialization > + > +1.5.1.1 The Frontend Device (Client) > +The vhost-pci server socket path SHOULD be provided to a virtio client socket > +for the connection. > + > +The client SHOULD send three socket messages, > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD), > +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits) > +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD), > +to the server, and wait until receiving the corresponding three ACK > +messages from the server. > + > +The client may receive the following ACK socket messages from the server: > +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device > +VM has successfully mapped the memory, and a vhost-pci device is created on > +the device VM for the driver VM. > +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device > +VM fails to map the memory. Receiving this message results in the failure of > +setting up the vhost-pci based inter-VM communication support for the driver > +VM. > +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device > +VM has successfully initialized the related interfaces to communicate to the > +fronted device. > +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device > +VM fails to initialize the related interfaces to communicate to the fronted > +device. > +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of > +this message contains the feature bits accepted by the vhost-pci device and > +driver. If the accepted feature bits are not equal to the feature bits sent by > +the client, the client MUST reset the device to go into backwards capability > +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and > +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket > +message to the vhost-pci server. Otherwise, no > +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is > +sent back to the server. > + > +1.5.1.2 The Vhost-pci Device (Server) > +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST > +be created when it boots. > + > +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) > +socket message, it SHOULD check if a vhost-pci device has been created for the > +requesting VM. If the client_uuid contained in the socket message is not new > +to the server, the server SHOULD simply update the received message to the > +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD > +create a new vhost-pci device, and continue the following memory mapping > +related initialization steps. > + > +The vhost-pci server SHOULD add up all the memory region size, and use a > +64-bit device bar for the mapping of all the memory regions obtained from the > +socket message. To better support the driver VM to hot-plug memory, the bar > +SHOULD be configured with a double size of the driver VM's memory. The server > +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and > +then the new created vhost-pci device SHOULD be hot-plugged to the VM. > + > +When the device status is updated with DRIVER_OK, a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control > +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server > +may receive the following ACK messages from the driver via the control > +receiveq: > +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver > +has successfully added the memory info to its support. The server SHOULD send > +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client. > +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver > +fails to add the memory info to its support. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client. > + > +When the vhost-pci server receives a > +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a > +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq, > +and inject a controlq interrupt to the VM. When the server receives a > +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message > +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted > +feature bits) socket message to the client. If the accepted feature bits sent > +to the client does not equal to the one that it received, the server SHOULD > +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature > +bits) socket message from the client, which indicates that the frontend device > +has finished the re-negotiation of the accepted feature bits. > + > +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket > +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the > +control transmitq, and inject a controlq interrupt to the VM. The server may > +receive the following ACK messages from the driver: > +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the > +vhost-pci driver has successfully added the frontend device to its support > +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) > +socket message to the corresponding client. > +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the > +vhost-pci driver fails to add the frontend device to its support list. The > +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket > +message to the corresponding client. > + > +1.5.2 Driver Requirements: Device Initialization > +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control > +receiveq if it succeeds to handle the received controlq message or not. > +The vhost-pci driver MUST NOT accept any feature bits that are not offered > +by the remote feature bits. > + > +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) > +controlq message, it MUST initialize the corresponding driver interfaces of > +the device type if they are not initialized, and add the device id to the > +support list that records all the frontend devices being supported by > +vhost-pci for inter-VM communications. > + > +1.6 Device Operation > +1.6.1 Device Requirements: Device Operation > +1.6.1.1 The Frontend Device (Client) > +When the frontend device changes any info (e.g. device_id, virtq address) > +that it has sent to the vhost-pci device, it MUST send a > +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The > +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq > +message to the control transmitq, and inject a controlq interrupt to the VM. > + > +When the frontend virtio device is removed (e.g. being hot-plugged out), the > +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the > +server. > + > +Before the driver VM is destroyed or migrated, all the clients that connect to > +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to > +the server. The destroying or migrating activity MUST wait until all the > +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received. > + > +When the driver VM hot-adds or hot-removes memory, it SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server. > + > +4.6.1.2 The Vhost-pci Device (Server) > +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or > +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a > +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or > +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq, > +and inject a controlq interrupt to the VM. It may receive the following ACK > +controlq messages from the driver: > +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver > +has successfully updated the device info. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the > +corresponding client. > +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver > +fails to update the device info. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the > +corresponding client. > +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver > +has successfully removed the vhost-pci support for the frontend device. The > +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket > +message to the corresponding client. > +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver > +fails to remove the vhost-pci support for the frontend device. The server > +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to > +the corresponding client. > + > +When there is no client of a driver VM connecting to the vhost-pci device, > +the server SHOULD destroy the vhost-pci device for that driver VM. > + > +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message, > +it calculates the total size of the received memory. If the new memory size > +plus the mapped memory size is smaller than the address space size reserved by > +the bar, the server SHOULD map the new memory and expose it to the VM via the > +QEMU MemoryRegion mechanism. Then it SHOULD put a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq, > +and inject a controlq interrupt to the VM. > + > +If the new memory size plus the mapped memory size is larger than the address > +space size reserved by the bar, the server SHOULD > +1. clone out a new vhost-pci device; > +2. configure the bar size to be double of the current memory size; and > +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci > +device to the VM. > + > +The initialization steps SHOULD follow 1.5 Device Initialization, except the > +interaction messages between the server and client are not needed. > + > +The server may receive the following two memory info add related ACK controlq > +messages from the driver: > +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver > +has successfully added the new memory info support. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding > +client. > +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver > +fails to add the new memory info support. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding > +client. > + > +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message, > +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the > +control transmitq, and inject a controlq interrupt to the VM. The server may > +receive the following two memory ACK controlq messages from the driver: > +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver > +has successfully deleted the memory info support. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE) socket message to the > +corresponding client. > +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver > +fails to delete the memory info support. The server SHOULD send a > +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding > +client. > + > +1.6.2 Driver Requirements: Device Operation > +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the > +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) > +controlq messages before acknowledging the server. > -- > 1.8.3.1 >
Attachment:
signature.asc
Description: PGP signature