On Tue, May 31, 2022 at 10:26 PM Parav Pandit <parav@xxxxxxxxxx> wrote: > > > > > From: Eugenio Perez Martin <eperezma@xxxxxxxxxx> > > Sent: Friday, May 27, 2022 3:55 AM > > > > On Fri, May 27, 2022 at 4:26 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > > > > On Thu, May 26, 2022 at 8:54 PM Parav Pandit <parav@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > > From: Eugenio Pérez <eperezma@xxxxxxxxxx> > > > > > Sent: Thursday, May 26, 2022 8:44 AM > > > > > > > > > Implement stop operation for vdpa_sim devices, so vhost-vdpa will > > > > > offer > > > > > > > > > > that backend feature and userspace can effectively stop the device. > > > > > > > > > > > > > > > > > > > > This is a must before get virtqueue indexes (base) for live > > > > > migration, > > > > > > > > > > since the device could modify them after userland gets them. There > > > > > are > > > > > > > > > > individual ways to perform that action for some devices > > > > > > > > > > (VHOST_NET_SET_BACKEND, VHOST_VSOCK_SET_RUNNING, ...) but > > there > > > > > was no > > > > > > > > > > way to perform it for any vhost device (and, in particular, vhost-vdpa). > > > > > > > > > > > > > > > > > > > > After the return of ioctl with stop != 0, the device MUST finish > > > > > any > > > > > > > > > > pending operations like in flight requests. It must also preserve > > > > > all > > > > > > > > > > the necessary state (the virtqueue vring base plus the possible > > > > > device > > > > > > > > > > specific states) that is required for restoring in the future. The > > > > > > > > > > device must not change its configuration after that point. > > > > > > > > > > > > > > > > > > > > After the return of ioctl with stop == 0, the device can continue > > > > > > > > > > processing buffers as long as typical conditions are met (vq is > > > > > enabled, > > > > > > > > > > DRIVER_OK status bit is enabled, etc). > > > > > > > > Just to be clear, we are adding vdpa level new ioctl() that doesn’t map to > > any mechanism in the virtio spec. > > > > > > We try to provide forward compatibility to VIRTIO_CONFIG_S_STOP. That > > > means it is expected to implement at least a subset of > > > VIRTIO_CONFIG_S_STOP. > > > > > > > Appending a link to the proposal, just for reference [1]. > > > > > > > > > > Why can't we use this ioctl() to indicate driver to start/stop the device > > instead of driving it through the driver_ok? > > > > > > > Parav, I'm not sure I follow you here. > > > > By the proposal, the resume of the device is (From qemu POV): > > 1. To configure all data vqs and cvq (addr, num, ...) 2. To enable only CVQ, not > > data vqs 3. To send DRIVER_OK 4. Wait for all buffers of CVQ to be used 5. To > > enable all others data vqs (individual ioctl at the moment) > > > > Where can we fit the resume (as "stop(false)") here? If the device is stopped > > (as if we send stop(true) before DRIVER_OK), we don't read CVQ first. If we > > send it right after (or instead) DRIVER_OK, data buffers can reach data vqs > > before configuring RSS. > > > It doesn’t make sense with currently proposed way of using cvq to replay the config. The stop/resume part is not intended to restore the config through the CVQ. The stop call is issued to be able to retrieve the vq status (base, in vhost terminology). The symmetric operation (resume) was added on demand, it was never intended to be part neither of the config restore or the virtqueue state restore workflow. The configuration restore workflow was modelled after the device initialization, so each part needed to add the less things the better, and only qemu needed to be changed. From the device POV, there is no need to learn new tricks for this. The support of .set_vq_ready and .get_vq_ready is already in the kernel in every vdpa backend driver. > Need to continue with currently proposed temporary method that subsequently to be replaced with optimized flow as we discussed. Back then, it was noted by you that enabling each data vq individually after DRIVER_OK is slow on mlx5 devices. The solution was to batch these enable calls accounting in the kernel, achieving no growth in the vdpa uAPI layer. The proposed solution did not involve the resume operation. After that, you proposed in this thread "Why can't we use this ioctl() to indicate driver to start/stop the device instead of driving it through the driver_ok?". As I understand, that is a mistake, since it requires the device, the vdpa layer, etc... to learn new tricks. It requires qemu to duplicate the initialization layer (it's now common for start and restore config). But I might have not seen the whole picture, missing advantages of using the resume call for this workflow. Can you describe the workflow you have in mind? How does that new workflow affect this proposal? I'm ok to change the proposal as long as we find we obtain a net gain. Thanks!