On Thu, May 12, 2022 at 3:44 AM Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote: > > This is a proposal to restore the state of the vhost-vdpa device at > the destination after a live migration. It uses as many available > features both from the device and from qemu as possible so we keep the > communication simple and speed up the merging process. When we finalize the design, we can formalize it in kernel Documentation/ > > # Initializing a vhost-vdpa device. > > Without the context of live migration, the steps to initialize the > device from vhost-vdpa at qemu starting are: > 1) [vhost] Open the vdpa device, Using simply open() > 2) [vhost+virtio] Get device features. These are expected not to > change in the device's lifetime, so we can save them. Qemu issues a > VHOST_GET_FEATURES ioctl and vdpa forwards to the backend driver using > get_device_features() callback. For "virtio" do you mean it's an action that is defined in the spec? > 3) [vhost+virtio] Get its max_queue_pairs if _F_MQ and _F_CTRL_VQ. > These are obtained using VHOST_VDPA_GET_CONFIG, and that request is > forwarded to the device using get_config. QEMU expects the device to > not change it in its lifetime. > 4) [vhost] Vdpa set status (_S_ACKNOLEDGE, _S_DRIVER). Still no > FEATURES_OK or DRIVER_OK. The ioctl is VHOST_VDPA_SET_STATUS, and the > vdpa backend driver callback is set_status. > > These are the steps used to initialize the device in qemu terminology, > taking away some redundancies to make it simpler. > > Now the driver sends the FEATURES_OK and the DRIVER_OK, and qemu > detects it, so it *starts* the device. > > # Starting a vhost-vdpa device > > At virtio_net_vhost_status we have two important variables here: > int cvq = _F_CTRL_VQ ? 1 : 0; > int queue_pairs = _F_CTRL_VQ && _F_MQ ? (max_queue_pairs of step 3) : 0; > > Now identification of the cvq index. Qemu *know* that the device will > expose it at the last queue (max_queue_pairs*2) if _F_MQ has been > acknowledged by the guest's driver or 2 if not. It cannot depend on > any data sent to the device via cvq, because we couldn't get its > command status on a change. > > Now we start the vhost device. The workflow is currently: > > 5) [virtio+vhost] The first step is to send the acknowledgement of the > Virtio features and vhost/vdpa backend features to the device, so it > knows how to configure itself. This is done using the same calls as > step 4 with these feature bits added. > 6) [virtio] Set the size, base, addr, kick and call fd for each queue > (SET_VRING_ADDR, SET_VRING_NUM, ...; and forwarded with > set_vq_address, set_vq_state, ...) > 7) [vdpa] Send host notifiers and *send SET_VRING_ENABLE = 1* for each > queue. This is done using ioctl VHOST_VDPA_SET_VRING_ENABLE, and > forwarded to the vdpa backend using set_vq_ready callback. > 8) [virtio + vdpa] Send memory translations & set DRIVER_OK. > > If we follow the current workflow, the device is allowed now to start > receiving only on vq pair 0, since we've still not set the multi queue > pair. This could cause the guest to receive packets in unexpected > queues, breaking RSS. > > # Proposal > > Our proposal diverge in step 7: Instead of enabling *all* the > virtqueues, only enable the CVQ. After that, send the DRIVER_OK and > queue all the control commands to restore the device status (MQ, RSS, > ...). Once all of them have been acknowledged ("device", or emulated > cvq in host vdpa backend driver, has used all cvq buffers, enable > (SET_VRING_ENABLE, set_vq_ready) all other queues. > > Everything needed for this is already implemented in the kernel as far > as I see, there is only a small modification in qemu needed. Thus > achieving the restoring of the device state without creating > maintenance burden. Yes, one of the major motivations is to try to reuse the existing APIs as much as possible as a start. It doesn't mean we can't invent new API, but having a dedicated save/restore uAPI looks fine. But it looks more like a work that needs to be finalized in the virtio spec otherwise we may end up with code that is hard to maintain. Thanks > > A lot of optimizations can be applied on top without the need to add > stuff to the migration protocol or vDPA uAPI, like the pre-warming of > the vdpa queues or adding more capabilities to the emulated CVQ. > > Other optimizations like applying the state out of band can also be > added so they can run in parallel with the migration, but that > requires a bigger change in qemu migration protocol making us lose > focus on achieving at least the basic device migration in my opinion. > > Thoughts? > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization