On Wed, Jun 27, 2007 at 02:43:18PM +0100, Richard W.M. Jones wrote: > virt-install has some code which waits for a domain to appear just after > it has been created. It looks like the loop attached to the end of this > email, and is functional but has two problems. > > Problem (1) is that self.conn.lookupByName doesn't distinguish between a > "Not found" domain and an actual error. For example there is no way to > tell the difference between being unable to contact xend (an actual > error), and being able to contact xend, but xend not being able to find > the domain (not found). It is possible to tell the difference, we just don't report it well. > As shown here: > > >>> import libvirt > >>> conn = libvirt.open ("xen+tls:///") > >>> d = conn.lookupByName ("Domain-0") > >>> d = conn.lookupByName ("doesnotexist") > [...] > libvirt.libvirtError: virDomainLookupByName() failed > > then I deliberately kill the remote daemon: > > >>> d = conn.lookupByName ("doesnotexist") > libvir: Remote error : Error in the push function. > [...] > > The first exception is a Not found condition (not an error) whereas the > second is an error. There is no explicit libvirt error code for 'no such domain' so it is basically impossible to catch this scenario in an app currently This problem is actually not just an issue with virLookup* group of functions. Basically any function which takes a virDomainPtr arg can have a 'no such domain' error, since a domain could have gone away in the time since the app got hold of a virDomainPtr object. The QEMU driver just throws a generic 'internal error', but I think its worth introducing an explicit 'no such domain' and 'no such network' error code and fixing all functions to report these correctly. > > Problem (2) is that virterror is over anxious to print error messages to > stderr, even if the caller can handle them and even if (as in the Not > found case) they don't indicate errors. In practical terms this means > that the virt-install loop attached below may print out 1 or 2 error > messages even when it is functioning normally. You'll see an error like > this appearing [sic]: > > libvir: Xen Daemon error : GET operation failed: In the python bindings all errors are converted into Exceptions, so the python binding really shouldn't be printing out anything to the console at all by default. It'll all be reported as exceptions. The default error reporting func in libvirt is doing this so perhaps we should register a no-op func. > Since it's difficult to change the LookupBy* functions without changing > the ABI, I suspect that the best thing to do is going to be to add a new > call with better semantics. Therefore I suggest: > > virDomainPtr * > virDomainLookup (virConnectPtr conn, int flags, > int id, char *str, int *error); > > where flags is one of: > VIR_LOOKUP_BY_ID, VIR_LOOKUP_BY_NAME, VIR_LOOKUP_BY_UUID > or VIR_LOOKUP_BY_UUID_STRING > > The return values are: > ret = domain, *error = 0 => found it > ret = NULL, *error = 0 => not found > ret = NULL, *error = 1 => error (check virterror) I'd do it the other way around, returning the error code, and putting the domain object into a parameter int virDomainLookup (virConnectPtr conn, int flags, int id, char *str, virDomainPtr *dom); That said, I'm not convinced we need this if we fix the error reporting of the original functions to allow the 'no such domain' error to be reliably caught & handled. Dan, -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=|