Re: libxl driver does not notice domain killed with xl destroy

Jim Fehlig <jfehlig@xxxxxxxx> · Fri, 7 Dec 2018 17:10:12 -0700

On 12/7/18 8:47 AM, Marek Marczykowski-Górecki wrote:
Hi,

If one kills a domain with `xl destroy`, libvirt will not notice it and
still report the domain as running. Also trying to destroy the domain
through libvirt will fail. The only way to recover from such a situation
is to restart libvirt daemon.

Ouch.

While (I think) the right answer is "don't use xl destroy", libvirt
could do better.

Nod. Restarting libvirtd is pretty drastic.

libxl actually report such cases via LIBXL_EVENT_TYPE_DOMAIN_DEATH, but
it is ignored by libvirt's libxl driver.

Why don't we handle that in libxlDomainEventHandler()?

The problem is that
LIBXL_EVENT_TYPE_DOMAIN_DEATH is also reported when it's triggered by
libvirt (for example libxlDomainShutdownHandleDestroy, or
libxlDomainDestroyFlags).

Ah, yes, now I remember why :-).

For a proper solution I need a race-free way for checking if given
LIBXL_EVENT_TYPE_DOMAIN_DEATH was caused by libvirt (and is already
handled - including setting appropriate domain object state etc).

A bit of context:

https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_domain.c#L750-L765
     libxlDomainDestroyInternal(libxlDriverPrivatePtr driver,
                                virDomainObjPtr vm)
     {
         libxlDriverConfigPtr cfg = libxlDriverConfigGet(driver);
         int ret = -1;

         /* Unlock virDomainObj during destroy, which can take considerable
          * time on large memory domains.
          */
         virObjectUnlock(vm);
         ret = libxl_domain_destroy(cfg->ctx, vm->def->id, NULL);
         virObjectLock(vm);

         virObjectUnref(cfg);
         return ret;
     }

Example usage:
https://github.com/libvirt/libvirt/blob/master/src/libxl/libxl_driver.c#L1357-L1409
     static int
     libxlDomainDestroyFlags(virDomainPtr dom,
                             unsigned int flags)
     {
(...)
         if (libxlDomainObjBeginJob(driver, vm, LIBXL_JOB_MODIFY) < 0)
             goto cleanup;

         if (virDomainObjCheckActive(vm) < 0)
             goto endjob;

         if (libxlDomainDestroyInternal(driver, vm) < 0) {
             virReportError(VIR_ERR_INTERNAL_ERROR,
                            _("Failed to destroy domain '%d'"), vm->def->id);
             goto endjob;
         }

         virDomainObjSetState(vm, VIR_DOMAIN_SHUTOFF,
                              VIR_DOMAIN_SHUTOFF_DESTROYED);

         event = virDomainEventLifecycleNewFromObj(vm, VIR_DOMAIN_EVENT_STOPPED,
                                          VIR_DOMAIN_EVENT_STOPPED_DESTROYED);
(...)
     }

It's tricky, because domain object lock is released for the
libxl_domain_destroy() call time and I think there is no guarantee that
the lock will be taken back by libxlDomainDestroyInternal() before
LIBXL_EVENT_TYPE_DOMAIN_DEATH is reported (and possibly handled). Which means
I can't use domain object state for checking if libvirt already know
about this domain death, because it is set to VIR_DOMAIN_SHUTOFF, after
libxlDomainDestroyInternal() call. And while setting domain
object state twice (once from LIBXL_EVENT_TYPE_DOMAIN_DEATH handler,
then from libxlDomainDestroyInternal() caller) wouldn't be that bad
(only ugly), but emitting domain lifecycle event twice would be a major
problem.

Yes, that is a problem.

Any ideas? Some internal flag set by libxlDomainDestroyInternal() just
before libxl_domain_destroy() call? Where would be a place for such
thing?

In the libxlDomainObjPrivate object, defined in libxl_domain.h?

Regards,
Jim

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list