Re: [PATCH v14 vfio 5/8] vfio/pds: Add VFIO live migration support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/8/2023 3:27 PM, Alex Williamson wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.


On Mon, 7 Aug 2023 13:57:52 -0700
Brett Creeley <brett.creeley@xxxxxxx> wrote:
...
+static int
+pds_vfio_suspend_wait_device_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+     union pds_core_adminq_cmd cmd = {
+             .lm_suspend_status = {
+                     .opcode = PDS_LM_CMD_SUSPEND_STATUS,
+                     .vf_id = cpu_to_le16(pds_vfio->vf_id),
+             },
+     };
+     struct device *dev = pds_vfio_to_dev(pds_vfio);
+     union pds_core_adminq_comp comp = {};
+     unsigned long time_limit;
+     unsigned long time_start;
+     unsigned long time_done;
+     int err;
+
+     time_start = jiffies;
+     time_limit = time_start + HZ * SUSPEND_TIMEOUT_S;
+     do {
+             err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true);
+             if (err != -EAGAIN)
+                     break;
+
+             msleep(SUSPEND_CHECK_INTERVAL_MS);
+     } while (time_before(jiffies, time_limit));
+
+     time_done = jiffies;
+     dev_dbg(dev, "%s: vf%u: Suspend comp received in %d msecs\n", __func__,
+             pds_vfio->vf_id, jiffies_to_msecs(time_done - time_start));
+
+     /* Check the results */
+     if (time_after_eq(time_done, time_limit)) {
+             dev_err(dev, "%s: vf%u: Suspend comp timeout\n", __func__,
+                     pds_vfio->vf_id);
+             err = -ETIMEDOUT;

If the command completes successfully but exceeds the time limit
this turns a success into a failure.  Is that desired?  Thanks,

Alex


Yes, this is the desired behavior. Based on the testing we have done and downtime expectations for live migration it seems like the suspend operation timing out after 5 seconds is reasonable behavior. It could succeed, but that could also be the case if the timeout value was 10, 20, 30, or more seconds. Even so, it seems like we don't want to keep spinning in the driver forever and we also don't want to further increase the VM downtime.

Thanks,

Brett



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux