Re: [RFC PATCH] Use PAUSED state for domains that are starting up

Jiri Denemark <jdenemar@xxxxxxxxxx> · Thu, 19 Feb 2015 17:07:45 +0100

On Mon, Feb 16, 2015 at 15:07:19 +0000, Daniel P. Berrange wrote:
> On Mon, Feb 16, 2015 at 04:03:50PM +0100, Jiri Denemark wrote:
> > On Mon, Feb 16, 2015 at 14:57:17 +0000, Daniel P. Berrange wrote:
> > > On Mon, Feb 16, 2015 at 03:50:41PM +0100, Jiri Denemark wrote:
> > > > When libvirt is starting a domain, it reports the state as SHUTOFF until
> > > > it's RUNNING. This is not ideal because domain startup may take a long
> > > > time (usually because of some configuration issues, firewalls blocking
> > > > access to network disks, etc.) and domain lists provided by libvirt look
> > > > awkward. One can see weird shutoff domains with IDs in a list of active
> > > > domains or even shutoff transient domains. In any case, it looks more
> > > > like a bug in libvirt than a normal state a domain goes through.
> > > 
> > > A shutoff transient domain isn't too bad IMHO, but a shutoff domain
> > > with an ID number is definitely not expected.
> > > 
> > > Could we perhaps address it by ensuring that we always return '-1'
> > > for ID if the state is "SHUTOFF", even if def->id has a positive
> > > value ?
> > 
> > But we should somehow make it clear that the domain is actually there,
> > somehow, only not completely usable. That is, one may need to actually
> > call virsh destroy on such domain to get rid of the leftover process if
> > something goes wrong.
> 
> Hmm, if something goes wrong due virDomainStart though, we should be
> tearing down the QEMU process. IIRC we should even be kill -9'ing QEMU,
> so even if QEMU is stuck in an uninterruptable sleep and won't exit,
> once the (storage?) problem causing that sleep is resolved QEMU will
> exit without further intervention. Similarly calling 'destroy' more
> times won't make it any more likely to quit, once it has had a SIGKILL

You're right of course. However, I still feel we should distinguish
shutoff domain from a domain that is being started. Considering it
shutoff until we have a monitor connection may cause all sorts of
confusion. Except for shutoff transient domains, one can see a shutoff
domain that cannot be started because it is already running (or perhaps
because acquiring a job fails), it's impossible to distinguish between a
domain which was running previously and wasn't cleaned up for whatever
reason (bug in libvirt most likely) from a normal state when libvirt is
waiting for a monitor to show up...

Jirka

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list