On 3/12/21 1:44 AM, Peter Krempa wrote:
On Thu, Mar 11, 2021 at 16:47:54 -0700, Jim Fehlig wrote:
On 3/10/21 9:37 AM, Peter Krempa wrote:
Commit 94e45d1042e broke exec-restart of virtlogd and virtlockd as the
code waiting for the daemon shutdown closed the daemons before
exec-restarting.
This reminds me of an odd issue we encountered three years ago, fixed by Daniel
https://listman.redhat.com/archives/libvir-list/2018-March/msg00298.html
I tested your patches but notice locks are still lost on re-exec.
qemu.conf:
lock_manager = "lockd"
qemu-lockd.conf:
file_lockspace_dir = "/var/lib/libvirt/lockspace"
/var/lib/libvirt/lockspace is nothing special, xfs on a local disk. After
starting a VM
# ls /var/lib/libvirt/lockspace/
a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c
# lslocks | grep lockd
virtlockd 95009 POSIX WRITE 0 0 0 /var/lib/libvirt/lockspace/a89872e150e6b9e4cbd59ef2bd289bc6cd0a8fa6fbf533c41957f77a90381e9c
virtlockd 95009 POSIX 5B WRITE 0 0 0 /run/virtlockd.pid
# systemctl reload virtlockd
Could you make sure that the virtlockd process before and after has the
same pid, so that it wasn't actually restarted by systemct?
I thought I checked it, but apparently not...
I'm asking because in my current test I've encountered another crash
when exec-restarting:
2021-03-12 08:41:31.649+0000: 2765718: error : virJSONValueToBuffer:1946 : internal error: failed to convert virJSONValue to yajl data
double free or corruption (fasttop)
Program received signal SIGABRT, Aborted.
0x00007ffff77819d5 in raise () from /lib64/libc.so.6
(gdb) bt
#0 0x00007ffff77819d5 in raise () at /lib64/libc.so.6
#1 0x00007ffff776a8a4 in abort () at /lib64/libc.so.6
#2 0x00007ffff77c4177 in __libc_message () at /lib64/libc.so.6
#3 0x00007ffff77cbe6c in annobin_top_check.start () at /lib64/libc.so.6
#4 0x00007ffff77cd393 in _int_free () at /lib64/libc.so.6
#5 0x00007ffff7a0b70d in g_free () at /lib64/libglib-2.0.so.0
#6 0x00007ffff7c0977f in virJSONValueFree (value=0x5555555710b0) at ../../../libvirt/src/util/virjson.c:401
#7 0x000055555555c3f2 in glib_autoptr_clear_virJSONValue (_ptr=0x5555555c4250) at ../../../libvirt/src/util/virjson.h:173
#8 glib_autoptr_cleanup_virJSONValue (_ptr=<synthetic pointer>) at ../../../libvirt/src/util/virjson.h:173
#9 virLockDaemonPreExecRestart (argv=0x7fffffffe428, dmn=<optimized out>, state_file=<optimized out>) at ../../../libvirt/src/locking/lock_daemon.c:700
#10 main (argc=<optimized out>, argv=0x7fffffffe428) at ../../../libvirt/src/locking/lock_daemon.c:1148
because looking again I'm seeing the same crash. Facepalm!
Looks like a double free. I'll post patches later for this.
I noticed your patches are pushed. A quick test verified all is working well
now. Thanks!
Regards,
Jim