On Sun, 27 Oct 2024 12:07:44 +0200 Yishai Hadas <yishaih@xxxxxxxxxx> wrote: > > - According to the Virtio specification, a device has only two states: > RUNNING and STOPPED. Consequently, certain VFIO transitions (e.g., > RUNNING_P2P->STOP, STOP->RUNNING_P2P) are treated as no-ops. When > transitioning to RUNNING_P2P, the device state is set to STOP and > remains STOPPED until it transitions back from RUNNING_P2P->RUNNING, at > which point it resumes its RUNNING state. Does this assume the virtio device is not a DMA target for another device? If so, how can we make such an assumption? Otherwise, what happens on a DMA write to the stopped virtio device? > - Furthermore, the Virtio specification does not support reading partial > or incremental device contexts. This means that during the PRE_COPY > state, the vfio-virtio driver reads the full device state. This step is > beneficial because it allows the device to send some "initial data" > before moving to the STOP_COPY state, thus reducing downtime by > preparing early. To avoid an infinite number of device calls during > PRE_COPY, the vfio-virtio driver limits this flow to a maximum of 128 > calls. After reaching this limit, the driver will report zero bytes > remaining in PRE_COPY, signaling to QEMU to transition to STOP_COPY. If the virtio spec doesn't support partial contexts, what makes it beneficial here? Can you qualify to what extent this initial data improves the overall migration performance? If it is beneficial, why is it beneficial to send initial data more than once? In particular, what heuristic supports capping iterations at 128? The code also only indicates this is to prevent infinite iterations. Would it be better to rate-limit calls, by reporting no data available for some time interval after the previous call? Thanks, Alex