On 07.11.2018 17:29, John Ferlan wrote: > > > On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote: >> Let's introduce shutdown reason "daemon" which means we have to >> kill running domain ourselves as the best action we can take at >> that moment. Failure to pick up domain on daemon restart is >> one example of such case. Using reason "crashed" is a bit misleading >> as it is used when qemu is actually crashed or qemu destroyed on >> guest panic as result of user choice that is the problem was >> in qemu/guest itself. So I propose to use "crashed" only for >> qemu side issues and introduce "daemon" when we have to kill the qemu >> for good. >> >> There is another example where "daemon" will be useful. If we can >> not reboot domain we kill it and got "crashed" reason now. Looks >> like good candidate for "daemon" reason. >> >> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >> --- >> include/libvirt/libvirt-domain.h | 1 + >> src/conf/domain_conf.c | 3 ++- >> src/qemu/qemu_process.c | 11 ++++------- >> tools/virsh-domain-monitor.c | 3 ++- >> 4 files changed, 9 insertions(+), 9 deletions(-) >> > > [...] > > Now that I've pushed commits 296e05b54 and 8f0f8425d, let's revisit > this... Merging in that set of changes with this patch means that > instead of: > >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index e9c7618..c4bc9ca 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque) >> /* We can't get the monitor back, so must kill the VM >> * to remove danger of it ending up running twice if >> * user tries to start it again later >> - * If we couldn't get the monitor since QEMU supports >> - * no-shutdown, we can safely say that the domain >> - * crashed ... */ >> - state = VIR_DOMAIN_SHUTOFF_CRASHED; >> - /* If BeginJob failed, we jumped here without a job, let's hope another >> + * If BeginJob failed, we jumped here without a job, let's hope another >> * thread didn't have a chance to start playing with the domain yet >> * (it's all we can do anyway). >> */ >> - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags); >> + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, >> + QEMU_ASYNC_JOB_NONE, stopFlags); >> } >> goto cleanup; >> } >> @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, >> * is no thread that could be doing anything else with the same domain >> * object. >> */ >> - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, >> + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, >> QEMU_ASYNC_JOB_NONE, 0); >> qemuDomainRemoveInactiveJobLocked(src->driver, obj); >> > > We'd have: > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > index 3bf84834ec..023e187729 100644 > --- a/src/qemu/qemu_process.c > +++ b/src/qemu/qemu_process.c > @@ -7995,10 +7995,14 @@ qemuProcessReconnect(void *opaque) > * > * If we cannot get to the monitor when the QEMU command > * line used -no-shutdown, then we can safely say that the > - * domain crashed; otherwise, we don't really know. */ > + * domain crashed; otherwise, if the monitor was started, > + * then we can blame ourselves, else we failed before the > + * monitor started so we don't really know. */ > if (!priv->mon && tryMonReconn && > qemuDomainIsUsingNoShutdown(priv)) > state = VIR_DOMAIN_SHUTOFF_CRASHED; > + else if (priv->mon) > + state = VIR_DOMAIN_SHUTOFF_DAEMON; > else > state = VIR_DOMAIN_SHUTOFF_UNKNOWN; > > @@ -8045,7 +8049,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, > * is no thread that could be doing anything else with the same > domain > * object. > */ > - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, > + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, > QEMU_ASYNC_JOB_NONE, 0); I think we need to report VIR_DOMAIN_SHUTOFF_UNKNOWN here. This is similar to qemuProcessReconnect before we try to reconnect. May be this hunk need to be placed in distinct patch therefore. > qemuDomainRemoveInactiveJobLocked(src->driver, obj); > > [...] > > So does this essentially meet/handle the condition you were trying to > alter? That is - once we get beyond the monitor reconnection, then if > we end up in error that means the daemon has decided to stop things. > Leaving the UNKNOWN to be the period prior to attempting to reconnect > and CRASHED to the period during which we try to connect to the monitor. > > This also captures failures after connection is successful (when > priv->mon in qemuConnectMonitor). > > Getting more specific failures of why the connection failed is perhaps a > future task if it really matters. This looks good enough for me too. > > If this looks good to you let me know, I can merge and push with the > previously noted changes and a commit message of: > > "This patch introduces a new shutdown reason "daemon" in order > to indicate that the daemon needed to force shutdown the domain > as the best course of action to take at the moment. > > This action would occur during reconnection when processing encounters > an error once the monitor reconnection is successful." > > Besides the first comment I'm fine with this change. I'm going to send a patch to change reason to VIR_DOMAIN_SHUTOFF_DAEMON in appropriate reboot failure cases too. Also one day let's fix the code not to kill unrelated process. Nikolay -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list