On 16.10.2018 15:48, John Ferlan wrote: > > > On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote: >> Let's introduce shutdown reason "daemon" which means we have to >> kill running domain ourselves as the best action we can take at >> that moment. Failure to pick up domain on daemon restart is >> one example of such case. Using reason "crashed" is a bit misleading >> as it is used when qemu is actually crashed or qemu destroyed on >> guest panic as result of user choice that is the problem was >> in qemu/guest itself. So I propose to use "crashed" only for >> qemu side issues and introduce "daemon" when we have to kill the qemu >> for good. > > How about (although reading below may shed some more context): > > This patch introduces a new shutdown reason "daemon" in order > to indicate that the daemon needed to force shutdown the domain > as the best course of action to take at the moment. > > This action would occur during Reconnect processing when it's > determined that either the QEMU monitor no longer exists for > some unknown reason or creation of a thread to reconnect the > domain failed. Mostly I'm fine with this one except "monitor no longer exists" is condition for reason "crashed" or "unknown" (Martin's contribution) > >> >> There is another example where "daemon" will be useful. If we can >> not reboot domain we kill it and got "crashed" reason now. Looks >> like good candidate for "daemon" reason. > > If you feel this way, then a followup patch could/should be posted; > otherwise, this'll be lost to commit message heaven where all good ideas > go to die ;-). Sure! > >> >> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >> --- >> include/libvirt/libvirt-domain.h | 1 + >> src/conf/domain_conf.c | 3 ++- >> src/qemu/qemu_process.c | 11 ++++------- >> tools/virsh-domain-monitor.c | 3 ++- >> 4 files changed, 9 insertions(+), 9 deletions(-) >> >> diff --git a/include/libvirt/libvirt-domain.h b/include/libvirt/libvirt-domain.h >> index fdd2d6b..11fdab5 100644 >> --- a/include/libvirt/libvirt-domain.h >> +++ b/include/libvirt/libvirt-domain.h >> @@ -145,6 +145,7 @@ typedef enum { >> VIR_DOMAIN_SHUTOFF_FAILED = 6, /* domain failed to start */ >> VIR_DOMAIN_SHUTOFF_FROM_SNAPSHOT = 7, /* restored from a snapshot which was >> * taken while domain was shutoff */ >> + VIR_DOMAIN_SHUTOFF_DAEMON = 8, /* daemon have to kill domain */ > > s/have to/decides to/ > >> # ifdef VIR_ENUM_SENTINELS >> VIR_DOMAIN_SHUTOFF_LAST >> # endif >> diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c >> index 9911d56..e441723 100644 >> --- a/src/conf/domain_conf.c >> +++ b/src/conf/domain_conf.c >> @@ -779,7 +779,8 @@ VIR_ENUM_IMPL(virDomainShutoffReason, VIR_DOMAIN_SHUTOFF_LAST, >> "migrated", >> "saved", >> "failed", >> - "from snapshot") >> + "from snapshot", >> + "daemon") >> >> VIR_ENUM_IMPL(virDomainCrashedReason, VIR_DOMAIN_CRASHED_LAST, >> "unknown", >> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c >> index e9c7618..c4bc9ca 100644 >> --- a/src/qemu/qemu_process.c >> +++ b/src/qemu/qemu_process.c >> @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque) >> /* We can't get the monitor back, so must kill the VM >> * to remove danger of it ending up running twice if >> * user tries to start it again later > > Since we're changing anyway, let's put the stop there, e.g. > "s/later/later./" > >> - * If we couldn't get the monitor since QEMU supports >> - * no-shutdown, we can safely say that the domain >> - * crashed ... */ >> - state = VIR_DOMAIN_SHUTOFF_CRASHED; >> - /* If BeginJob failed, we jumped here without a job, let's hope another >> + * If BeginJob failed, we jumped here without a job, let's hope another >> * thread didn't have a chance to start playing with the domain yet >> * (it's all we can do anyway). >> */ >> - qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags); >> + qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, >> + QEMU_ASYNC_JOB_NONE, stopFlags); >> } >> goto cleanup; >> } >> @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj, >> * is no thread that could be doing anything else with the same domain >> * object. >> */ >> - qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED, >> + qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON, >> QEMU_ASYNC_JOB_NONE, 0); >> qemuDomainRemoveInactiveJobLocked(src->driver, obj); >> > > When qemuProcessReconnectHelper was introduced (commit d38897a5d) > reconnection failure used VIR_DOMAIN_SHUTOFF_FAILED; however, that was > changed in commit bda2f17d to either VIR_DOMAIN_SHUTOFF_CRASHED or > VIR_DOMAIN_SHUTOFF_UNKNOWN. When QEMU_CAPS_NO_SHUTDOWN checking was > removed in commit fe35b1ad6 the conditional state was just left at > VIR_DOMAIN_SHUTOFF_CRASHED. > > Still I agree w/ Martin's thoughts, which unfortunately wasn't commented > on in the code leading to the latter revert commit losing that unknown > reason. The reason for the "-no-shutdown" *did* change and that probably > should have been reflected in the qemuProcessReconnect logic. > > So I think perhaps we should have an initial patch which references or > uses that first paragraph in order to change the code to: > > /* We cannot get the monitor back, so kill the VM to > * remove possibility of it ending up running twice if > * user tries to start it again later. > * > * If we cannot get to the monitor when the QEMU command > * line used -no-shutdown, then we can safely say that the > * domain crashed; otherwise we don't really know. */ > if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES) > state = VIR_DOMAIN_SHUTOFF_CRASHED; > else > state = VIR_DOMAIN_SHUTOFF_UNKNOWN; > > and uses the short commit msg of "qemu: Restore lost shutdown reason". > > Then this followup would change the "UNKNOWN" to "DAEMON" keeping the > CRASHED otherwise. > Then we will get "crashed" reason always for modern qemu and domains that can reboot which is not true. However I see now that the issue is more complicated then I thought. I agree with Martin too but we have to additionally check that connect(2) for domain monitor socket was called and error was ECONNREFUSED. So finally state is resolved like this: if (connect(2) was not called) { state = VIR_DOMAIN_SHUTOFF_UNKNOWN; else if (connect(2) result is NOT ECONNREFUSED) { state = VIR_DOMAIN_SHUTOFF_UNKNOWN; } else if (connect(2) result is ECONNREFUSED) { if (priv->monJSON && priv->allowReboot == VIR_TRISTATE_BOOL_YES) state = VIR_DOMAIN_SHUTOFF_CRASHED; else state = VIR_DOMAIN_SHUTOFF_UNKNOWN; } else { state = VIR_DOMAIN_SHUTOFF_DAEMON; } Moreover we should not later kill process by remembered pid in all cases except last one in the above code because we can kill some other process. Looks like a bit complicated, not sure should we pursuit this one or just left VIR_DOMAIN_SHUTOFF_UNKNOWN only (in this case we can introduce VIR_DOMAIN_SHUTOFF_DAEMON for reboot failures where reason is obvisous). Nikolay > >> diff --git a/tools/virsh-domain-monitor.c b/tools/virsh-domain-monitor.c >> index 3a26363..f0ad558 100644 >> --- a/tools/virsh-domain-monitor.c >> +++ b/tools/virsh-domain-monitor.c >> @@ -212,7 +212,8 @@ VIR_ENUM_IMPL(virshDomainShutoffReason, >> N_("migrated"), >> N_("saved"), >> N_("failed"), >> - N_("from snapshot")) >> + N_("from snapshot"), >> + N_("daemon")) >> >> VIR_ENUM_DECL(virshDomainCrashedReason) >> VIR_ENUM_IMPL(virshDomainCrashedReason, >> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list