On Wed, Nov 05, 2008 at 04:26:15PM +0100, Daniel Veillard wrote: > First version of a small patch allowing to gather debug informations > from a running libvirt daemon. Basically one can send signal USR2 > to the daemon, the daemon will from that point dump its current internal > state and all verbose debug output to a file /tmp/libvirt_debug_xxxx. > When sending back signal USR2 the file is closed and can be looked at > for analysis, this allow to save extensive debug informations from a > running daemon. Hardcoding a file in /tmp is not very good practice - it'll certainly not play nicely with SELinux. We should make the logfile name be configurable via /etc/libvirt/libvirtd.conf. > The patch is rather minimal right now, it just applies to > qemud/qemud.c, modifies it to have global variables for error and > information output FILE, an init routing setting them up and hooking > a handler for USR2, the handler, a very minimal state dump. But it > works as is. > > Now I would like to extend that debugging to the library itself, so > any app linking to libvirt can be debugged in the same way. I would > also like to add debugging routines for most internal data structures. > I'm still wondering about the best form for those, should they use > a FILE * argument, an fd argument or a virBufferPtr for genericity > (probably the later would make most sense). > > #ifdef ENABLE_DEBUG > void virConnectDebug(virBufferPtr buf, virConnectPtr conn); > #endif > > and similar for main internal data structures and drivers > The patch is just a work in progress but trying to get early feedback. > Maybe this could be used to get remote introspection capabilities too > but ATM I'm more looking at providing an easy way to get debug > informations on a libvirt program. I think use virBufferPtr is rather too cumbersome for the calling program. The internal API for logging currently just takes void qemudLog(int priority, const char *fmt, ...) Either we can pass variable args out to the app, or format the error message and then pass it out. The calling app can write this straight to a file, send to syslog, or anything else it wants, without having the intermediate step of potentially uneccessary buffering. I think it'll also be desirable to include a 'category' string to allow applications to filter the messages it uses. In the Java world, with log4j, the typical pattern is that each class registers a logging category matching its classname, eg you might have org.libvirt.Connect, org.libvirt.Domain, etc. The application using the class can configure the log4j infrastructure to turn on/off log messages per-category. We need a similar capability in libvirt - the existing logging in the daemon is already non-scalable because the 'EVENT' messages are soo frequent they typically drown out the more interesting stuff from the daemon. In parallel with debugging / logging of functional aspects of the daemon / library, it is also desirable to get the ability to record a log of operations invoked on objects. eg, I want to record all operation invoked against a particular driver, so as a support person I can ask a bug-reporter to send me the log of the operations they did leading upto the problem. As an application using libvirt API, I'd expect to have ability to register a callback to receive debug info, and APIs to control the logging level enum { VIR_LOG_DEBUG, VIR_LOG_INFO, VIR_LOG_WARN, VIR_LOG_ERROR, }; typedef void (*virLogHandler)(const char *category, int priority, const char *msg); virLogAddHandler(virLogHandler callback); virLogSetDefaultPriority(int priority); virLogSetPriority(const char *category, int priority); And a semi-public internal API (ie for use by libvirt.so and libvirtd) virLogMessage(const char *category, int priority, const char *fmt, ....); I'd be inclined to say that categories are named to match the filename, so if you wanted to log messasges from the event.c file, at a level of 'INFO' or higher, you'd do void mylogger(const char *cat, int prior, const char *msg) { fprintf(stderr, "%s", msg); } virLogAddHandler(mylogger); virLogSetPriority("file.events", VIR_INFO) The default priority would be 'WARN', unless explicitly set for a category. In the context of libvirtd daemon, if you passed '--verbose' we'd set virLogDefaultPriority(VIR_LOG_INFO) and if you passed --debug we'd set it to VIR_LOG_DEBUG. So the same public API would be useful both for libvirtd, and other apps linking to libvirt.so in just the same way - we'd merely need to expose virLogMessage() to the daemon so its own log messages can flow through the same channels as those inside the library. Currently we toggle between using stderr, and syslog according to whether the --deamon flag is passed. This isn't particularly great. We should be configurable independantly of thise. I'd expect /etc/libvirt/libvirtd.conf to allow you to set # Choice of 'null', 'syslog', 'stderr', 'file' log.backend = "syslog" # If 'file' was chosen, then also allow log.file = "/var/log/libvirtd.log" # or if 'syslog' was chosen, then allow admin to set the # syslog facility name for openlog() call. log.syslog = "libvirtd" # One of VIR_LOG_XXX constants log.priority = "WARN" # And per category overrides log.category.events = "DEBUG" log.category.mdns = "INFO" We already have SIGUSR1 hooked up to make it re-read the domain XML files, so we could extend this to also re-read the master config file. So if someone wanted to temporarily debug the system they could just edit 'log.backend' from 'null' to 'file', and send the 'SIGUSR1'. Or if they already have it logging to a file by default and just wanted to increase verbosity, they'd edit 'log.priority' and send SIGUSR1. At any time, they can also send SIGUSR2 as per your patch, and get a dump listing all active sockets, sent to whatever log.backend they currently have configured - either syslog, or file, or whateve else we implement. This comprehensive logging solution is rather alot of work, so for immediate use I'd suggest just adding the config for 'log.file' and 'log.backend' choice, and ignore the per-category debug levels for 0.5.0 release. > Also I'm unclear, do we really want to have all the debug strings > internationalized with _() , that's more work for localization team > and it's unclear this would benefit 'end users'. I think debug messages are potentially important not to localize. They are not typically intended for end users, but for developers & vendor support people, where having everything in english may make it easier to search for similar problems. > > +static FILE *error_out = NULL; > +static FILE *info_out = NULL; > +static int debug_activated = 0; > + > +#ifdef ENABLE_DEBUG > + > +/* > + * Signal entry point on USR2 we can't do anything at that point except > + * log the signal and have qemudToggleDebug called when back into the > + * main loop. > + */ > +static void sig_debug(int sig, siginfo_t * siginfo, void* context) { > + sig_handler(sig, siginfo, context); > +} > + > +/* > + * Debug the state of a client > + */ > +static void qemudDumpDebugClient(struct qemud_client *client) { > + if (client->magic != QEMUD_CLIENT_MAGIC) { > + qemudLog(QEMUD_DEBUG, > + _("\nQEmud client: invalid magic %X, skipping\n"), > + (unsigned int) client->magic); > + return; > + } > + qemudLog(QEMUD_DEBUG, _("QEmud client: fd %d readonly %d mode %d"), > + client->fd, client->readonly, client->mode); > + > + /* TODO: more complete dump of state, especially the connection */ > +} > + > +/* > + * Debug the state of a server > + */ > +static void qemudDumpDebugServer(struct qemud_server *server) { > + struct qemud_client *client; > + > + if (server == NULL) { > + qemudLog(QEMUD_DEBUG, "%s", > + _("QEmud server: NULL pointer\n")); > + return; > + } > + qemudLog(QEMUD_DEBUG, > + _("QEmud server: listening on %d sockets, %d clients\n"), > + server->nsockets, server->nclients); > + > + client = server->clients; > + > + while (client != NULL) { > + qemudDumpDebugClient(client); > + client = client->next; > + } > +} > + > +/* > + * Toggle the debug status on/off, on on create a new temporary > + * debug file and start saving the output there > + * It is called when the signal USR2 is received. > + */ > +static void qemudToggleDebug(struct qemud_server *server) { > + if (debug_activated == 0) { > + char path[50] = "/tmp/libvirt_debug_XXXXXX"; This path needs to be configurable from the /etc/libvirt/libvirtd.conf file, and default to /var/log to be selinux compliant. > + int fd = mkstemp(path); > + if (fd >= 0) { > + error_out = fdopen(fd, "a"); > + if (error_out != NULL) { > + info_out = error_out; > + debug_activated = 1; > + qemudDumpDebugServer(server); > + } else { > + qemudLog(QEMUD_ERR, > + "%s", _("Failed to create temporary debug file")); > + error_out = stderr; > + } > + } else { > + qemudLog(QEMUD_ERR, > + "%s", _("Failed to get temporary debug file")); > + } > + } else { > + debug_activated = 0; > + fclose(error_out); > + error_out = stderr; > + info_out = stdout; > + } > +} > +#endif > + > +/* > + * Set up the debugging environment > + */ > +static void qemudInitDebug(void) { > +#ifdef ENABLE_DEBUG > + struct sigaction oldact; > + struct sigaction sig_action; > + > + /* > + * if there is already an handler, leave it as is to > + * avoid disturbing the application's behaviour > + */ > + if (sigaction (SIGUSR2, NULL, &oldact) == 0) { > + if (oldact.sa_handler == NULL && oldact.sa_sigaction == NULL) { > + sig_action.sa_sigaction = sig_debug; > + sigaction(SIGUSR2, &sig_action, NULL); > + } > + } Since this is inside the daemon, we already know that no other SIGUSR2 handler is present, so can ignore that check. > +#endif > + error_out = stderr; > + info_out = stdout; > +} > + > static void qemudDispatchClientEvent(int fd, int events, void *opaque); > static void qemudDispatchServerEvent(int fd, int events, void *opaque); > static int qemudRegisterClientEvent(struct qemud_server *server, > @@ -260,7 +370,11 @@ qemudDispatchSignalEvent(int fd ATTRIBUT > siginfo.si_signo); > server->shutdown = 1; > break; > - > +#ifdef ENABLE_DEBUG > + case SIGUSR2: > + qemudToggleDebug(server); > + break; > +#endif IMHO, toggle of log level should be done with SIGUSR1 and libvirtd.conf reload, and SIGUSR2 should just be doing the 'state dump' of active sockets, clients & other interesting info, to whatever logging target is active, be it syslog or a file, or stderr. > @@ -306,7 +420,7 @@ void qemudLog(int priority, const char * > > va_start(args, fmt); > > - if (godaemon) { > + if ((godaemon) && (!debug_activated)) { > int sysprio = -1; > > switch(priority) { > @@ -336,22 +450,22 @@ void qemudLog(int priority, const char * > switch(priority) { > case QEMUD_ERR: > case QEMUD_WARN: > - vfprintf(stderr, fmt, args); > - fputc('\n', stderr); > + vfprintf(error_out, fmt, args); > + fputc('\n', error_out); > break; > > case QEMUD_INFO: > - if (verbose) { > - vprintf(fmt, args); > - fputc('\n', stdout); > + if ((verbose) || (debug_activated)) { > + vfprintf(info_out, fmt, args); > + fputc('\n', info_out); > } > break; > > #ifdef ENABLE_DEBUG > case QEMUD_DEBUG: > - if (verbose) { > - vprintf(fmt, args); > - fputc('\n', stdout); > + if ((verbose) || (debug_activated)) { > + vfprintf(info_out, fmt, args); > + fputc('\n', info_out); > } > break; This qemudLog() function is really flawed - if toggles between the use of syslog() vs stderr, based on whether you pass --daemon flag. It needs to be configurable by the config file, independantly of --daemon. As this patch stands, nothing will ever be logged because 99% of libvirtd instances run with --daemon flag set, and thus are using the syslog() code branch > #endif > @@ -2163,12 +2277,14 @@ int main(int argc, char **argv) { > return 2; > > default: > - fprintf (stderr, "libvirtd: internal error: unknown flag: %c\n", > + fprintf (error_out, "libvirtd: internal error: unknown flag: %c\n", > c); > exit (1); > } > } > > + qemudInitDebug(); > + > if (godaemon) { > openlog("libvirtd", 0, 0); > if (qemudGoDaemon() < 0) { > QEmud server: listening on 2 sockets, 1 clients > > QEmud client: fd 10 readonly 0 mode 0 > EVENT: Calculate expiry of 1 timers > EVENT: Timeout at 0 due in -1 ms > EVENT: Poll on 5 handles 0x24d91f0 timeout -1 > EVENT: Poll got 1 event > EVENT: Dispatch 10 1 0x24c1610 > EVENT: Remove handle 10 > EVENT: mark delete 4 > EVENT: Add handle 10 13 0x4087d0 0x24c1610 > EVENT: Remove handle 10 > EVENT: mark delete 5 > EVENT: Add handle 10 14 0x4087d0 0x24c1610 > EVENT: Calculate expiry of 1 timers THis pile of (mostly useless) EVENT log messages is why we need to ultimately have 'category' filters per file :-) Regards, Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list