Strictly speaking, only the last patch actually fixes a real problem. The first three are more of a cleanup than anything. However, I am still sending them because they make the code better. Anyway, if my explanation in 4/4 is not clear enough, here it is in simpler terms: It's important to bear in mind that at any point when connection to virtlockd is closed, virtlockd goes through its hash table and if it finds any resources acquired for the process that has closed the connection, it will kill the process. In other words, resources are per PID not per connection. For instance, if libvirtd would open two connections, acquired two different resources through each of them and then closed one it would get killed instantly because it still owns some resources. So now that this virtlockd behaviour is clear (I must say intended and desired behaviour), we can start looking into how libvirtd runs qemu. Basically, problem is that instead of opening connection to virtlockd once and keeping it open through whole run of libvirtd I used this shortcut: open the connection in virSecurityManagerMetadataLock(), dup() the connection FD, and close it in virSecurityManagerMetadataUnlock(). In theory, this works pretty well - when the duplicated FD is closed, libvirtd doesn't own any locks anymore so virtlockd doesn't kill it. What my design did not count with is fork(). On fork() the duplicated FD is cloned into the child and thus connection is closed nobody knows when. Meantime, libvirtd might continue its execution (e.g. starting another domain from another thread) and call virSecurityManagerMetadataLock(). However, before it gets to call Unlock() the child process decides to close the FD (either via exec() or exit()). Given virtlockd behaviour described couple of paragraphs above I guess it is clear to see now that virtlockd kills libvirtd. My solution to this is to always run secdriver transactions from a separate child process because on fork() only the thread that calls it is cloned. So only the thread that runs the transaction is cloned, not the one that is starting a different domain. Therefore no other fork() can occur here and therefore we are safe. I know, I know, it's complicated. But it always is around fork() and IPC. If you want to see my fix in action, just enable metadata locking, and try to start two or more domains in a loop: for ((i=0;i<1000;i++)); do \ virsh start u1604-1 & virsh start u1604-2 &\ sleep 3;\ virsh destroy u1604-1; virsh destroy u1604-2;\ done At some point you'll see virsh reporting I/O error (this is because virtlockd killed libvirtd). With my patch, I run the test multiple times without any hiccup. @Bjoern: I'm still unable to reproduce the issue you reported. However, whilst trying to do so I've came across this bug. However, my gut feeling is that this might help you. Michal Prívozník (4): security: Grab a reference to virSecurityManager for transactions virNetSocket: Be more safe with fork() around virNetSocketDupFD() virLockManagerLockDaemonAcquire: Duplicate client FD with CLOEXEC flag security: Always spawn process for transactions src/libxl/libxl_migration.c | 4 ++-- src/locking/domain_lock.c | 18 ++++++++++++++++++ src/locking/lock_driver_lockd.c | 2 +- src/qemu/qemu_migration.c | 2 +- src/rpc/virnetclient.c | 5 +++-- src/rpc/virnetclient.h | 2 +- src/rpc/virnetsocket.c | 7 ++----- src/rpc/virnetsocket.h | 2 +- src/security/security_dac.c | 17 +++++++++-------- src/security/security_selinux.c | 17 +++++++++-------- 10 files changed, 47 insertions(+), 29 deletions(-) -- 2.16.4 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list