On 8/8/2023 3:27 PM, Alex Williamson wrote:
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
On Mon, 7 Aug 2023 13:57:52 -0700
Brett Creeley <brett.creeley@xxxxxxx> wrote:
...
+static int
+pds_vfio_suspend_wait_device_cmd(struct pds_vfio_pci_device *pds_vfio)
+{
+ union pds_core_adminq_cmd cmd = {
+ .lm_suspend_status = {
+ .opcode = PDS_LM_CMD_SUSPEND_STATUS,
+ .vf_id = cpu_to_le16(pds_vfio->vf_id),
+ },
+ };
+ struct device *dev = pds_vfio_to_dev(pds_vfio);
+ union pds_core_adminq_comp comp = {};
+ unsigned long time_limit;
+ unsigned long time_start;
+ unsigned long time_done;
+ int err;
+
+ time_start = jiffies;
+ time_limit = time_start + HZ * SUSPEND_TIMEOUT_S;
+ do {
+ err = pds_vfio_client_adminq_cmd(pds_vfio, &cmd, &comp, true);
+ if (err != -EAGAIN)
+ break;
+
+ msleep(SUSPEND_CHECK_INTERVAL_MS);
+ } while (time_before(jiffies, time_limit));
+
+ time_done = jiffies;
+ dev_dbg(dev, "%s: vf%u: Suspend comp received in %d msecs\n", __func__,
+ pds_vfio->vf_id, jiffies_to_msecs(time_done - time_start));
+
+ /* Check the results */
+ if (time_after_eq(time_done, time_limit)) {
+ dev_err(dev, "%s: vf%u: Suspend comp timeout\n", __func__,
+ pds_vfio->vf_id);
+ err = -ETIMEDOUT;
If the command completes successfully but exceeds the time limit
this turns a success into a failure. Is that desired? Thanks,
Alex
Yes, this is the desired behavior. Based on the testing we have done and
downtime expectations for live migration it seems like the suspend
operation timing out after 5 seconds is reasonable behavior. It could
succeed, but that could also be the case if the timeout value was 10,
20, 30, or more seconds. Even so, it seems like we don't want to keep
spinning in the driver forever and we also don't want to further
increase the VM downtime.
Thanks,
Brett