Side note: The 'v2' should be in the subject prefix, not the subject. E.g. 'git
send-email --subject-prefix="vhostmd PATCH V2" ...'.
On 06/19/2018 07:33 AM, Michael Trapp wrote:
vhostmd has no signal handler for SIGPIPE and a restart of libvirtd results in a
stopped vhostmd. The root cause seems to be a UDS socket between vhostmd and
libvirtd which is closed by a libvirtd restart. In addition to the signal handler
the connection to libvirtd has to be opened again otherwise vhostmd can't read
any data from libvirtd and doesn't update the metrics.
I think most of the text below here can be removed from the commit message. IMO
it is information that is interesting for patch reviewers (and thus should
included after the '---') but not necessary in the commit message. It is
sufficient to describe the problem and a brief comment on how the patch fixes it.
Based on Daniel's reply, I've added a callback function for the connection close.
Well, the current patch is a minor change of the vhostmd connection handling but
it might be sufficient to solve the reconnect issue in the available vhostmd implementation.
Ensure any lines in the commit message > 80 columns are trimmed.
The reconnect can be checked with
service libvirtd stop
This results in a closed UDS socket of the vhostmd process
/var/run/libvirt/libvirt-sock-ro is not available anymore
and just to see the connection status on libvirt layer
virConnectIsAlive() returns -1 for the lost connection
after a
service libvirtd start
vhostmd connects to libvirtd with
connect(6, {sa_family=AF_FILE, path="/var/run/libvirt/libvirt-sock-ro"}, 110) = 0
and /proc/PID/fd contains
lrwx------ 1 root root 64 ... 6 -> socket:[173298]
and the VM metrics are updated again.
The libvirt API documents that virConnectIsAlive() 'Returns 1 if alive, 0 if dead, -1 on error'.
From vhostmd perspective '-1' should definitely result in a reconnect, because that's what we see
when libvirtd is restarted and vhostmd can recover from the closed connection.
At the moment I don't see a situation where vhostmd shouldn't try to reconnect and therefore
virConnectIsAlive() is not used.
As there seems to be no 'reconnect' function in the libvirt API, the connection is closed and
opened again with the assumption that this does not result in a leak in libvirt.
The reconnect is triggered once in a 'metric collection loop' and in case of a failed connection the error
is reported to the caller. If there is any additional error handling required, it should be implemented
in an upper layer because the error is already noticed in vhostmd_run() and a stop (restart) of vhostmd
must be handled in this function.
---
vhostmd/vhostmd.c | 2 ++
vhostmd/virt-util.c | 45 +++++++++++++++++++++++++++++++++++++--------
2 files changed, 39 insertions(+), 8 deletions(-)
diff --git a/vhostmd/vhostmd.c b/vhostmd/vhostmd.c
index 7f04705..4cf4630 100644
--- a/vhostmd/vhostmd.c
+++ b/vhostmd/vhostmd.c
@@ -117,6 +117,7 @@ static void sig_handler(int sig, siginfo_t *siginfo ATTRIBUTE_UNUSED,
case SIGQUIT:
down = 1;
break;
+ case SIGPIPE:
default:
break;
}
@@ -1053,6 +1054,7 @@ int main(int argc, char *argv[])
sigaction(SIGINT, &sig_action, NULL);
sigaction(SIGQUIT, &sig_action, NULL);
sigaction(SIGTERM, &sig_action, NULL);
+ sigaction(SIGPIPE, &sig_action, NULL);
xmlInitParser();
diff --git a/vhostmd/virt-util.c b/vhostmd/virt-util.c
index 1c31305..0358826 100644
--- a/vhostmd/virt-util.c
+++ b/vhostmd/virt-util.c
@@ -26,21 +26,48 @@
#include "util.h"
+enum {
+ CLOSED = 0,
+ ESTABLISHED
+} connection = CLOSED;
+
static virConnectPtr conn = NULL;
const char *libvirt_uri = NULL;
+void
+conn_close_cb (virConnectPtr c,
+ int reason,
+ void *p)
I realize the coding style in vhostmd can make your eyes hurt, but let's try to
stick with the prevailing style of no space between function name and the parens
of its argument list. Also all parameters on a single line if they fit.
+{
+ (void) p;
+ (void) reason;
You can use ATTRIBUTE_UNUSED instead. E.g.
void
conn_close_cb(virConnectPtr c,
int reason ATTRIBUTE_UNUSED,
void *p ATTRIBUTE_UNUSED)
+
+ if (c == conn) connection = CLOSED;
+}
+
static int
do_connect (void)
{
+ if (connection == ESTABLISHED) return 0;
+
+ if (conn != NULL) virConnectClose (conn);
This is the style I dislike the most in the vhostmd code :-). I know there is a
fair bit of it, but whitespace is cheap so lets make use of it and improve
readability
if (connection == ESTABLISHED)
return 0;
if (conn != NULL)
virConnectClose (conn);
+
+ conn = virConnectOpenReadOnly (libvirt_uri);
if (conn == NULL) {
- conn = virConnectOpenReadOnly (libvirt_uri);
- if (conn == NULL) {
- vu_log (VHOSTMD_ERR, "Unable to open libvirt connection to %s",
- libvirt_uri ? libvirt_uri : "default hypervisor");
- return -1;
- }
+ vu_log (VHOSTMD_ERR, "Unable to open libvirt connection to %s",
+ libvirt_uri ? libvirt_uri : "default hypervisor");
+ return -1;
}
+
+ if (virConnectRegisterCloseCallback (conn, conn_close_cb, NULL, NULL)) {
+ vu_log (VHOSTMD_ERR, "Unable to register callback 'virConnectCloseFunc'");
+ virConnectClose (conn);
+ conn = NULL;
+ return -1;
+ }
Same comment here about space between function name and parens.
+
+ connection = ESTABLISHED;
return 0;
}
@@ -98,8 +125,8 @@ vu_vm *vu_get_vm(int id)
void vu_vm_free(vu_vm *vm)
{
if (vm) {
- free(vm->name);
- free(vm->uuid);
+ if (vm->name) free(vm->name);
+ if (vm->uuid) free(vm->uuid);
Separate bug fix for a separate patch? And as part of it call vu_vm_free() in
vu_get_vm() error path?
free(vm);
}
}
@@ -107,8 +134,10 @@ void vu_vm_free(vu_vm *vm)
void vu_vm_connect_close()
{
if (conn) {
+ virConnectUnregisterCloseCallback(conn, conn_close_cb);
virConnectClose(conn);
conn = NULL;
}
+ connection = CLOSED;
}
The patch works fine in my testing. Can you fix the nits and send a V3? I was
going to fix them myself until I came across the unrelated bug fix mentioned
above, which should be submitted as a separate patch.
Regards,
Jim
_______________________________________________
virt-tools-list mailing list
virt-tools-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/virt-tools-list