RE: [PATCH 4/6 Resend] Vhost-pci RFC: Detailed Description in the Virtio Specification Format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 6/1/2016 4:15 PM, Xiao Guangrong wrote:
> On 05/29/2016 04:11 PM, Wei Wang wrote:
> > Signed-off-by: Wei Wang <wei.w.wang@xxxxxxxxx>
> > ---
> >   Details | 324
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 324 insertions(+)
> >   create mode 100644 Details
> >
> > diff --git a/Details b/Details
> > new file mode 100644
> > index 0000000..4ea2252
> > --- /dev/null
> > +++ b/Details
> > @@ -0,0 +1,324 @@
> > +1 Device ID
> > +TBD
> > +
> > +2 Virtqueues
> > +0 controlq
> > +
> > +3 Feature Bits
> > +3.1 Local Feature Bits
> > +Currently no local feature bits are defined, so the standard virtio
> > +feature bits negation will always be successful and complete.
> > +
> > +3.2 Remote Feature Bits
> > +The remote feature bits are obtained from the frontend virtio device
> > +and negotiated with the vhost-pci driver via the controlq. The
> > +negotiation steps are described in 4.5 Device Initialization.
> > +
> > +4 Device Configuration Layout
> > +struct vhost_pci_config {
> > +	#define VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK 0
> > +	#define VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK 1
> > +	#define VHOST_PCI_CONTROLQ_FEATURE_BITS_ACK 2
> > +	u32 ack_type;
> > +	u32 ack_device_type;
> > +	u64 ack_device_id;
> > +	union {
> > +		#define VHOST_PCI_CONTROLQ_ACK_ADD_DONE 0
> > +		#define VHOST_PCI_CONTROLQ_ACK_ADD_FAIL 1
> > +		#define VHOST_PCI_CONTROLQ_ACK_DEL_DONE 2
> > +		#define VHOST_PCI_CONTROLQ_ACK_DEL_FAIL 3
> > +		u64 ack_memory_info;
> > +		u64 ack_device_info;
> > +		u64 ack_feature_bits;
> > +	};
> > +};
> 
> Do you need to write all these 4 field to ack the operation? It seems it is not
> efficient and it is not flexible if the driver need to offer more data to the device
> in the further. Can we dedicate a vq for this purpose?

Yes, all the 4 fields are required to be written. The vhost-pci server usually connects to multiple clients, and the "ack_device_type" and "ack_device_id" fields are used to identify them.

Agree, another controlq for the guest->host direction looks better, and the above fileds can be converted to be the controlq message header.

> 
> BTW, current approach can not handle the case if there are multiple same kind
> of requests in the control queue, e.g, if there are two memory-add request in
> the control queue.

A vhost-pci device corresponds to a driver VM. The two memory-add requests on the controlq are all for the same driver VM.  Memory-add requests for different driver VMs  couldn’t be present on the same controlq. I haven’t seen the issue yet. Can you please explain more? Thanks.


> > +
> > +The configuration fields are currently used for the vhost-pci driver
> > +to acknowledge to the vhost-pci device after it receives controlq messages.
> > +
> > +4.5 Device Initialization
> > +When a device VM boots, it creates a vhost-pci server socket.
> > +
> > +When a virtio device on the driver VM is created with specifying the
> > +use of a vhost-pci device as a backend, a client socket is created
> > +and connected to the corresponding vhost-pci server for message exchanges.
> > +
> > +The messages passed to the vhost-pci server is proceeded by the
> > +following
> > +header:
> > +struct vhost_pci_socket_hdr {
> > +	#define VHOST_PCI_SOCKET_MEMORY_INFO 0
> > +	#define VHOST_PCI_SOCKET_MEMORY_INFO_ACK 1
> > +	#define VHOST_PCI_SOCKET_DEVICE_INFO 2
> > +	#define VHOST_PCI_SOCKET_DEVICE_INFO_ACK 3
> > +	#define VHOST_PCI_SOCKET_FEATURE_BITS 4
> > +	#define VHOST_PCI_SOCKET_FEATURE_BITS_ACK 5
> > +	u16 msg_type;
> > +	u16 msg_version;
> > +	u32 msg_len;
> > +	u64 qemu_pid;
> > +};
> > +
> > +The payload of the above message types can be constructed using the
> > +structures
> > +below:
> > +/* VHOST_PCI_SOCKET_MEMORY_INFO message */ struct
> > +vhost_pci_socket_memory_info {
> > +	#define VHOST_PCI_ADD_MEMORY 0
> > +	#define VHOST_PCI_DEL_MEMORY 1
> > +	u16 ops;
> > +	u32 nregions;
> > +	struct vhost_pci_memory_region {
> > +		int fd;
> > +		u64 guest_phys_addr;
> > +		u64 memory_size;
> > +		u64 mmap_offset;
> > +	} regions[VHOST_PCI_MAX_NREGIONS];
> > +};
> > +
> > +/* VHOST_PCI_SOCKET_DEVICE_INFO message */ struct
> > +vhost_pci_device_info {
> > +	#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
> > +	#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
> > +	u16    ops;
> > +	u32    nvirtq;
> > +	#define VHOST_PCI_FRONTEND_DEVICE_NET 1
> > +	#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
> > +	#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
> > +	#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
> > +	#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
> > +	#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
> > +	u32    device_type;
> > +	u64    device_id;
> > +	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
> > +};
> > +The device_id field identifies the device. For example, it can be
> > +used to store a MAC address if the device_type is
> VHOST_PCI_FRONTEND_DEVICE_NET.
> > +
> > +/* VHOST_PCI_SOCKET_FEATURE_BITS message*/ struct
> > +vhost_pci_feature_bits {
> > +	u64 feature_bits;
> > +};
> 
> We not only have 'socket feature bits' but also the feature bits for per virtio
> device plugged in on the side of vhost-pci device.

Yes. It is mentioned in "3 Feature Bits". The socket feature bits here are actually the remote feature bits (got from a socket message).

> 
> E.g: if there are two virtio devices (e.g, a NIC and BLK) both of them need to
> directly communicate with another VM. The feature bits of these two devices
> need to be negotiated with that VM respectively. And you can not put these
> feature bits in vhost_pci_device_info struct as its vq is not created at that time.

Right. If you check the initialization steps below, there is a statement "When the device status is updated with DRIVER_OK".

> > +
> > +/* VHOST_PCI_SOCKET_xx_ACK messages */ struct vhost_pci_socket_ack {
> > +	#define VHOST_PCI_SOCKET_ACK_ADD_DONE 0
> > +	#define VHOST_PCI_SOCKET_ACK_ADD_FAIL 1
> > +	#define VHOST_PCI_SOCKET_ACK_DEL_DONE 2
> > +	#define VHOST_PCI_SOCKET_ACK_DEL_FAIL 3
> > +	u64 ack;
> > +};
> > +
> > +The driver update message passed via the controlq is preceded by the
> > +following
> > +header:
> > +struct vhost_pci_controlq_hdr {
> > +	#define VHOST_PCI_CONTROLQ_MEMORY_INFO 0
> > +	#define VHOST_PCI_CONTROLQ_DEVICE_INFO 1
> > +	#define VHOST_PCI_CONTROLQ_FEATURE_BITS 2
> > +	#define VHOST_PCI_CONTROLQ_UPDATE_DONE 3
> > +	u16 msg_type;
> > +	u16 msg_version;
> > +	u32 msg_len;
> > +};
> > +
> > +The payload of a VHOST_PCI_CONTROLQ_MEMORY_INFO message can be
> > +constructed using the following structure:
> > +/* VHOST_PCI_CONTROLQ_MEMORY_INFO message */ struct
> > +vhost_pci_controlq_memory_info {
> > +	#define VHOST_PCI_ADD_MEMORY 0
> > +	#define VHOST_PCI_DEL_MEMORY 1
> > +	u16  ops;
> > +	u32 nregion;
> > +	struct exotic_memory_region {
> > +		u64   region_base_xgpa;
> > +		u64   size;
> > +		u64   offset_in_bar_area;
> > +	} region[VHOST_PCI_MAX_NREGIONS];
> > +};
> > +
> > +The payload of VHOST_PCI_CONTROLQ_DEVICE_INFO and
> > +VHOST_PCI_CONTROLQ_FEATURE_BITS messages can be constructed using
> the
> > +vhost_pci_device_info structure and the vhost_pci_feature_bits
> > +structure respectively.
> > +
> > +The payload of a VHOST_PCI_CONTROLQ_UPDATE_DONE message can be
> > +constructed using the structure below:
> > +struct vhost_pci_controlq_update_done {
> > +	u32    device_type;
> > +	u64    device_id;
> > +};
> > +
> > +Fig. 1 shows the initialization steps.
> > +
> > +When the vhost-pci server receives a
> > +VHOST_PCI_SOCKET_MEMORY_INFO(ADD) message, it checks if a vhost-pci
> > +device has been created for the requesting VM whose QEMU process id
> > +is qemu_pid. If yes, it will simply update the subsequent received
> > +messages to the vhost-pci driver via the controlq. Otherwise, the
> > +server creates a new vhost-pci device, and continues the following
> initialization steps.
> 
> 
> qemu-pid is not stable as the existing VM will be killed silently and the new
> vhost-pci driver reusing the same qemu-pid will ask to join before the vhost-
> device gets to know the previous one has gone.

Would it be a normal and legal operation to silently kill a QEMU? I guess only the system admin can do that, right?

If that's true, I think we can add a new field, "u64 tsc_of_birth" to the vhost_pci_socket_hdr structure. It records the tsc when the QEMU is created. 
If that's true, another problem would be the remove of the vhost-pci device for a silently killed driver VM.
The vhost-pci server may need to periodically send a checking message to check if the driver VM is silently killed. If that really happens, it should remove the related vhost-pci device.
 
> > +
> > +The vhost-pci server adds up all the memory region size, and uses a
> > +64-bit device bar for the mapping of all the memory regions obtained
> > +from the socket message. To better support memory hot-plugging of the
> > +driver VM, the bar is configured with a double size of the driver
> > +VM's memory. The server maps the received memory info via the QEMU
> > +MemoryRegion mechanism, and then the new created vhost-pci device is
> hot-plugged to the VM.
> > +
> > +When the device status is updated with DRIVER_OK, a
> > +VHOST_PCI_CONTROLQ_MEMORY_INFO(ADD) message, which is stemed
> from the
> > +memory info socket message, is put on the controlq and a controlq
> > +interrupt is injected to the VM.
> > +
> > +When the vhost-pci server receives a
> > +VHOST_PCI_CONTROLQ_MEMORY_INFO_ACK(ADD_DONE)
> acknowledgement from the
> > +driver, it sends a VHOST_PCI_SOCKET_MEMORY_INFO_ACK(ADD_DONE)
> message
> > +to the client that is identified by the ack_device_type and ack_device_id fields.
> > +
> > +When the vhost-pci server receives a
> > +VHOST_PCI_SOCKET_FEATURE_BITS(feature bits) message, a
> > +VHOST_PCI_CONTROLQ_FEATURE_BITS(feature bits) message is put on the
> > +controlq and a controlq interrupt is injected to the VM.
> > +
> > +If the vhost-pci server notices that the driver fully accepted the
> > +offered feature bits, it sends a
> > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the client.
> If
> > +the vhost-pci server notices that the vhost-pci driver only accepted
> > +a subset of the offered feature bits, it sends a
> > +VHOST_PCI_SOCKET_FEATURE_BITS(accepted feature bits) message back to
> > +the client. The client side virtio device re-negotiates the new
> > +feature bits with its driver, and sends back a
> > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE)
> > +message to the server.
> > +
> > +Either when the vhost-pci driver fully accepted the offered feature
> > +bits or a
> > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message is received
> from
> > +the client, the vhost-pci server puts a
> > +VHOST_PCI_CONTROLQ_UPDATE_DONE message on the controlq, and a
> controlq interrupt is injected to the VM.
> 
> Why VHOST_PCI_CONTROLQ_UPDATE_DONE is needed?

OK, this one looks redundant. We can set up the related support for that frontend device when the device info is received via the controlq.

Best,
Wei
 
> > +
> > +When the vhost-pci server receives a
> > +VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, a
> > +VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) message is put on the controlq
> and a controlq interrupt is injected to the VM.
> > +
> > +When the vhost-pci server receives a
> > +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement
> from the
> > +driver, it sends a VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE)
> message
> > +to the corresponding client.
> > +
> > +4.5.1 Device Requirements: Device Initialization To let a VM be
> > +capable of creating vhost-pci devices, a vhost-pci server MUST be
> > +created when it boots.
> > +
> > +The vhost-pci server socket path SHOULD be provided to a virtio
> > +client socket for the connection to the vhost-pci server.
> > +
> > +The virtio device MUST finish the feature bits negotiation with its
> > +driver before negotiating them with the vhost-pci device.
> > +
> > +If the client receives a VHOST_PCI_SOCKET_FEATURE_BITS(feature bits)
> > +message, it MUST reset the device to go into backwards capability
> > +mode, re-negotiate the received feature bits with its driver, and
> > +send back a
> > +VHOST_PCI_SOCKET_FEATURE_BITS_ACK(ADD_DONE) message to the server.
> > +
> > +In any cases that an acknowledgement from the vhost-pci driver
> > +indicates a FAIL, the vhost-pci server SHOULD send a FAIL socket message to
> the client.
> > +
> > +In any cases that the msg_type is different between the sender and
> > +the receiver, the receiver SHOULD acknowledge a FAIL to the sender or
> > +convert the message to its version if the converted version is still functionally
> usable.
> > +
> > +4.5.2 Driver Requirements: Device Initialization The vhost-pci driver
> > +MUST NOT accept any feature bits that are not offered by the remote
> > +feature bits, and SHOULD acknowledge to the device of the accepted
> > +feature bits by writing them to the vhost_pci_config fields.
> > +
> > +When the vhost-pci driver receives a
> VHOST_PCI_CONTROLQ_UPDATE_DONE
> > +message from the controlq, the vhost-pci driver MUST initialize the
> > +corresponding driver interface of the device_type if it has not been
> > +initialized, and add the device_id to the frontend device list that
> > +records all the frontend virtio devices being supported by vhost-pci for inter-
> VM communications.
> 
> Okay, i saw how to use it here. But, once the driver gets
> VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) then it knows how to
> communicate with the virtio device on another VM. Why we postpone the
> initialize until it gets VHOST_PCI_CONTROLQ_UPDATE_DONE?
> 
> > +
> > +The vhost-pci driver SHOULD acknowledge to the device that the device
> > +and memory info update (add or delete) is DONE or FAIL by writing the
> > +acknowledgement (DONE or FAIL) to the vhost_pci_config fields.
> > +
> > +The vhost-pci driver MUST ensure that writing to the vhost_pci_config
> > +fields to be atomic.
> > +
> > +4.6 Device Operation
> > +4.6.1 Device Requirements: Device Operation
> > +4.6.1.1 Frontend Device Info Update
> > +When the frontend virtio device changes any info (e.g. device_id,
> > +virtq
> > +address) that it has sent to the vhost-pci device, it SHOULD send a
> > +VHOST_PCI_SOCKET_DEVICE_INFO(ADD) message, which contains the new
> > +device info, to the vhost-pci server. The vhost-pci device SHOULD
> > +insert a
> > +VHOST_PCI_CONTROLQ_DEVICE_INFO(ADD) to the controlq and inject a
> > +contrlq interrupt to the VM.
> > +
> > +When the vhost-pci device receives a
> > +VHOST_PCI_CONTROLQ_DEVICE_INFO_ACK(ADD_DONE) acknowledgement
> from the
> > +driver, it SHOULD send a
> VHOST_PCI_SOCKET_DEVICE_INFO_ACK(ADD_DONE)
> > +message to the client that is identified by the ack_device_type and
> > +ack_device_id fields, to indicate that the vhost-pci driver has
> > +finished the handling of the device info update.
> 
> If VHOST_PCI_CONTROLQ_UPDATE_DONE is really needed, you missed it here.

��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�

[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux