On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote: > Let's introduce shutdown reason "daemon" which means we have to > kill running domain ourselves as the best action we can take at > that moment. Failure to pick up domain on daemon restart is > one example of such case. Using reason "crashed" is a bit misleading > as it is used when qemu is actually crashed or qemu destroyed on > guest panic as result of user choice that is the problem was > in qemu/guest itself. So I propose to use "crashed" only for > qemu side issues and introduce "daemon" when we have to kill the qemu > for good. > > There is another example where "daemon" will be useful. If we can > not reboot domain we kill it and got "crashed" reason now. Looks > like good candidate for "daemon" reason. > > Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> > --- > include/libvirt/libvirt-domain.h | 1 + > src/conf/domain_conf.c | 3 ++- > src/qemu/qemu_process.c | 11 ++++------- > tools/virsh-domain-monitor.c | 3 ++- > 4 files changed, 9 insertions(+), 9 deletions(-) > [...] Now that I've pushed commits 296e05b54 and 8f0f8425d, let's revisit this... Merging in that set of changes with this patch means that instead of: > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index e9c7618..c4bc9ca 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque) > /* We can't get the monitor back, so must kill the VM > * to remove danger of it ending up running twice if > * user tries to start it again later > - * If we couldn't get the monitor since QEMU supports > - * no-shutdown, we can safely say that the domain > - * crashed ... */ > - state = VIR_DOMAIN_SHUTOFF_CRASHED; > - /* If BeginJob failed, we jumped here without a job, let's hope another > + * If BeginJob failed, we jumped here without a job, let's hope another > * thread didn't have a chance to start playing with the domain yet > * (it's all we can do anyway). > */ > - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags); > + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, > + QEMU_ASYNC_JOB_NONE, stopFlags); > } > goto cleanup; > } > @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, > * is no thread that could be doing anything else with the same domain > * object. > */ > - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, > + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, > QEMU_ASYNC_JOB_NONE, 0); > qemuDomainRemoveInactiveJobLocked(src->driver, obj); > We'd have: diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 3bf84834ec..023e187729 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -7995,10 +7995,14 @@ qemuProcessReconnect(void *opaque) * * If we cannot get to the monitor when the QEMU command * line used -no-shutdown, then we can safely say that the - * domain crashed; otherwise, we don't really know. */ + * domain crashed; otherwise, if the monitor was started, + * then we can blame ourselves, else we failed before the + * monitor started so we don't really know. */ if (!priv->mon && tryMonReconn && qemuDomainIsUsingNoShutdown(priv)) state = VIR_DOMAIN_SHUTOFF_CRASHED; + else if (priv->mon) + state = VIR_DOMAIN_SHUTOFF_DAEMON; else state = VIR_DOMAIN_SHUTOFF_UNKNOWN; @@ -8045,7 +8049,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, * is no thread that could be doing anything else with the same domain * object. */ - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, QEMU_ASYNC_JOB_NONE, 0); qemuDomainRemoveInactiveJobLocked(src->driver, obj); [...] So does this essentially meet/handle the condition you were trying to alter? That is - once we get beyond the monitor reconnection, then if we end up in error that means the daemon has decided to stop things. Leaving the UNKNOWN to be the period prior to attempting to reconnect and CRASHED to the period during which we try to connect to the monitor. This also captures failures after connection is successful (when priv->mon in qemuConnectMonitor). Getting more specific failures of why the connection failed is perhaps a future task if it really matters. If this looks good to you let me know, I can merge and push with the previously noted changes and a commit message of: "This patch introduces a new shutdown reason "daemon" in order to indicate that the daemon needed to force shutdown the domain as the best course of action to take at the moment. This action would occur during reconnection when processing encounters an error once the monitor reconnection is successful." John -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list