Re: [PATCH] libvirt: add daemon itself as shutdown reason

Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> · Thu, 8 Nov 2018 07:03:11 +0000

On 07.11.2018 17:29, John Ferlan wrote:
> 
> 
> On 10/8/18 7:21 AM, Nikolay Shirokovskiy wrote:
>> Let's introduce shutdown reason "daemon" which means we have to
>> kill running domain ourselves as the best action we can take at
>> that moment. Failure to pick up domain on daemon restart is
>> one example of such case. Using reason "crashed" is a bit misleading
>> as it is used when qemu is actually crashed or qemu destroyed on
>> guest panic as result of user choice that is the problem was
>> in qemu/guest itself. So I propose to use "crashed" only for
>> qemu side issues and introduce "daemon" when we have to kill the qemu
>> for good.
>>
>> There is another example where "daemon" will be useful. If we can
>> not reboot domain we kill it and got "crashed" reason now. Looks
>> like good candidate for "daemon" reason.
>>
>> Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx>
>> ---
>>  include/libvirt/libvirt-domain.h |  1 +
>>  src/conf/domain_conf.c           |  3 ++-
>>  src/qemu/qemu_process.c          | 11 ++++-------
>>  tools/virsh-domain-monitor.c     |  3 ++-
>>  4 files changed, 9 insertions(+), 9 deletions(-)
>>
> 
> [...]
> 
> Now that I've pushed commits 296e05b54 and 8f0f8425d, let's revisit
> this...  Merging in that set of changes with this patch means that
> instead of:
> 
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index e9c7618..c4bc9ca 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -7988,15 +7988,12 @@ qemuProcessReconnect(void *opaque)
>>          /* We can't get the monitor back, so must kill the VM
>>           * to remove danger of it ending up running twice if
>>           * user tries to start it again later
>> -         * If we couldn't get the monitor since QEMU supports
>> -         * no-shutdown, we can safely say that the domain
>> -         * crashed ... */
>> -        state = VIR_DOMAIN_SHUTOFF_CRASHED;
>> -        /* If BeginJob failed, we jumped here without a job, let's hope another
>> +         * If BeginJob failed, we jumped here without a job, let's hope another
>>           * thread didn't have a chance to start playing with the domain yet
>>           * (it's all we can do anyway).
>>           */
>> -        qemuProcessStop(driver, obj, state, QEMU_ASYNC_JOB_NONE, stopFlags);
>> +        qemuProcessStop(driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>> +                        QEMU_ASYNC_JOB_NONE, stopFlags);
>>      }
>>      goto cleanup;
>>  }
>> @@ -8035,7 +8032,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
>>           * is no thread that could be doing anything else with the same domain
>>           * object.
>>           */
>> -        qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
>> +        qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>>                          QEMU_ASYNC_JOB_NONE, 0);
>>          qemuDomainRemoveInactiveJobLocked(src->driver, obj);
>>  
> 
> We'd have:
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 3bf84834ec..023e187729 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -7995,10 +7995,14 @@ qemuProcessReconnect(void *opaque)
>           *
>           * If we cannot get to the monitor when the QEMU command
>           * line used -no-shutdown, then we can safely say that the
> -         * domain crashed; otherwise, we don't really know. */
> +         * domain crashed; otherwise, if the monitor was started,
> +         * then we can blame ourselves, else we failed before the
> +         * monitor started so we don't really know. */
>          if (!priv->mon && tryMonReconn &&
>              qemuDomainIsUsingNoShutdown(priv))
>              state = VIR_DOMAIN_SHUTOFF_CRASHED;
> +        else if (priv->mon)
> +            state = VIR_DOMAIN_SHUTOFF_DAEMON;
>          else
>              state = VIR_DOMAIN_SHUTOFF_UNKNOWN;
> 
> @@ -8045,7 +8049,7 @@ qemuProcessReconnectHelper(virDomainObjPtr obj,
>           * is no thread that could be doing anything else with the same
> domain
>           * object.
>           */
> -        qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_FAILED,
> +        qemuProcessStop(src->driver, obj, VIR_DOMAIN_SHUTOFF_DAEMON,
>                          QEMU_ASYNC_JOB_NONE, 0);

I think we need to report VIR_DOMAIN_SHUTOFF_UNKNOWN here. This is similar to 
qemuProcessReconnect before we try to reconnect. May be this hunk need to be 
placed in distinct patch therefore.

>          qemuDomainRemoveInactiveJobLocked(src->driver, obj);
> 
> [...]
> 
> So does this essentially meet/handle the condition you were trying to
> alter?  That is - once we get beyond the monitor reconnection, then if
> we end up in error that means the daemon has decided to stop things.
> Leaving the UNKNOWN to be the period prior to attempting to reconnect
> and CRASHED to the period during which we try to connect to the monitor.
> 
> This also captures failures after connection is successful (when
> priv->mon in qemuConnectMonitor).
> 

> Getting more specific failures of why the connection failed is perhaps a
> future task if it really matters.

This looks good enough for me too.

> 
> If this looks good to you let me know, I can merge and push with the
> previously noted changes and a commit message of:
> 
> "This patch introduces a new shutdown reason "daemon" in order
> to indicate that the daemon needed to force shutdown the domain
> as the best course of action to take at the moment.
> 
> This action would occur during reconnection when processing encounters
> an error once the monitor reconnection is successful."
> 
> 

Besides the first comment I'm fine with this change. I'm going
to send a patch to change reason to VIR_DOMAIN_SHUTOFF_DAEMON in 
appropriate reboot failure cases too. Also one day let's fix the code not to kill
unrelated process.

Nikolay

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list