On Tue, Feb 22, 2022 at 4:56 PM Eugenio Perez Martin <eperezma@xxxxxxxxxx> wrote: > > On Tue, Feb 22, 2022 at 8:26 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > > > > 在 2022/2/21 下午4:15, Eugenio Perez Martin 写道: > > > On Mon, Feb 21, 2022 at 8:44 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > >> > > >> 在 2022/2/17 下午8:48, Eugenio Perez Martin 写道: > > >>> On Tue, Feb 8, 2022 at 9:16 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > >>>> 在 2022/2/1 下午7:25, Eugenio Perez Martin 写道: > > >>>>> On Sun, Jan 30, 2022 at 7:47 AM Jason Wang <jasowang@xxxxxxxxxx> wrote: > > >>>>>> 在 2022/1/22 上午4:27, Eugenio Pérez 写道: > > >>>>>>> @@ -272,6 +590,28 @@ void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd) > > >>>>>>> void vhost_svq_stop(VhostShadowVirtqueue *svq) > > >>>>>>> { > > >>>>>>> event_notifier_set_handler(&svq->svq_kick, NULL); > > >>>>>>> + g_autofree VirtQueueElement *next_avail_elem = NULL; > > >>>>>>> + > > >>>>>>> + if (!svq->vq) { > > >>>>>>> + return; > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + /* Send all pending used descriptors to guest */ > > >>>>>>> + vhost_svq_flush(svq, false); > > >>>>>> Do we need to wait for all the pending descriptors to be completed here? > > >>>>>> > > >>>>> No, this function does not wait, it only completes the forwarding of > > >>>>> the *used* descriptors. > > >>>>> > > >>>>> The best example is the net rx queue in my opinion. This call will > > >>>>> check SVQ's vring used_idx and will forward the last used descriptors > > >>>>> if any, but all available descriptors will remain as available for > > >>>>> qemu's VQ code. > > >>>>> > > >>>>> To skip it would miss those last rx descriptors in migration. > > >>>>> > > >>>>> Thanks! > > >>>> So it's probably to not the best place to ask. It's more about the > > >>>> inflight descriptors so it should be TX instead of RX. > > >>>> > > >>>> I can imagine the migration last phase, we should stop the vhost-vDPA > > >>>> before calling vhost_svq_stop(). Then we should be fine regardless of > > >>>> inflight descriptors. > > >>>> > > >>> I think I'm still missing something here. > > >>> > > >>> To be on the same page. Regarding tx this could cause repeated tx > > >>> frames (one at source and other at destination), but never a missed > > >>> buffer not transmitted. The "stop before" could be interpreted as "SVQ > > >>> is not forwarding available buffers anymore". Would that work? > > >> > > >> Right, but this only work if > > >> > > >> 1) a flush to make sure TX DMA for inflight descriptors are all completed > > >> > > >> 2) just mark all inflight descriptor used > > >> > > > It currently trusts on the reverse: Buffers not marked as used (by the > > > device) will be available in the destination, so expect > > > retransmissions. > > > > > > I may miss something but I think we do migrate last_avail_idx. So there > > won't be a re-transmission, since we depend on qemu virtqueue code to > > deal with vring base? > > > > On stop, vhost_virtqueue_stop calls vhost_vdpa_get_vring_base. In SVQ > mode, it returns last_used_idx. After that, vhost.c code set VirtQueue > last_avail_idx == last_used_idx, and it's migrated after that if I'm > not wrong. Ok, I miss these details in the review. I suggest mentioning this in the change log and add a comment in vhost_vdpa_get_vring_base(). > > vhost kernel migrates last_avail_idx, but it makes rx buffers > available on-demand, unlike SVQ. So it does not need to unwind buffers > or anything like that. Because of how SVQ works with the rx queue, > this is not possible, since the destination will find no available > buffers for rx. And for tx you already have described the scenario. > > In other words, we cannot see SVQ as a vhost device in that regard: > SVQ looks for total drain (as "make all guest's buffers available for > the device ASAP") vs the vhost device which can live with a lot of > available ones and it will use them on demand. Same problem as > masking. So the difference in behavior is justified in my opinion, and > it can be improved in the future with the vdpa in-flight descriptors. > > If we restore the state that way in a virtio-net device, it will see > the available ones as expected, not as in-flight. > > Another possibility is to transform all of these into in-flight ones, > but I feel it would create problems. Can we migrate all rx queues as > in-flight, with 0 bytes written? Is it worth it? To clarify, for inflight I meant from the device point of view, that is [last_used_idx, last_avail_idx). So for RX and SVQ, it should be as simple as stop forwarding buffers since last_used_idx should be the same as last_avail_idx in this case. (Though technically the rx buffer might be modified by the NIC). > I didn't investigate > that path too much, but I think the virtio-net emulated device does > not support that at the moment. If I'm not wrong, we should copy > something like the body of virtio_blk_load_device if we want to go > that route. > > The current approach might be too net-centric, so let me know if this > behavior is unexpected or we can do better otherwise. It should be fine to start from a networking device. We can add more in the future if it is needed. Thanks > > Thanks! > > > Thanks > > > > > > > > > > Thanks! > > > > > >> Otherwise there could be buffers that is inflight forever. > > >> > > >> Thanks > > >> > > >> > > >>> Thanks! > > >>> > > >>>> Thanks > > >>>> > > >>>> > > >>>>>> Thanks > > >>>>>> > > >>>>>> > > >>>>>>> + > > >>>>>>> + for (unsigned i = 0; i < svq->vring.num; ++i) { > > >>>>>>> + g_autofree VirtQueueElement *elem = NULL; > > >>>>>>> + elem = g_steal_pointer(&svq->ring_id_maps[i]); > > >>>>>>> + if (elem) { > > >>>>>>> + virtqueue_detach_element(svq->vq, elem, elem->len); > > >>>>>>> + } > > >>>>>>> + } > > >>>>>>> + > > >>>>>>> + next_avail_elem = g_steal_pointer(&svq->next_guest_avail_elem); > > >>>>>>> + if (next_avail_elem) { > > >>>>>>> + virtqueue_detach_element(svq->vq, next_avail_elem, > > >>>>>>> + next_avail_elem->len); > > >>>>>>> + } > > >>>>>>> } > > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization