[PATCH 0/4] Couple of metadata locking fixes

Michal Privoznik <mprivozn@xxxxxxxxxx> · Fri, 21 Sep 2018 11:29:55 +0200

Strictly speaking, only the last patch actually fixes a real problem.
The first three are more of a cleanup than anything. However, I am still
sending them because they make the code better.

Anyway, if my explanation in 4/4 is not clear enough, here it is in
simpler terms:

It's important to bear in mind that at any point when connection to
virtlockd is closed, virtlockd goes through its hash table and if it
finds any resources acquired for the process that has closed the
connection, it will kill the process. In other words, resources are per
PID not per connection.

For instance, if libvirtd would open two connections, acquired two
different resources through each of them and then closed one it would
get killed instantly because it still owns some resources.

So now that this virtlockd behaviour is clear (I must say intended and
desired behaviour), we can start looking into how libvirtd runs qemu.

Basically, problem is that instead of opening connection to virtlockd
once and keeping it open through whole run of libvirtd I used this
shortcut: open the connection in virSecurityManagerMetadataLock(), dup()
the connection FD, and close it in virSecurityManagerMetadataUnlock().

In theory, this works pretty well - when the duplicated FD is closed,
libvirtd doesn't own any locks anymore so virtlockd doesn't kill it.
What my design did not count with is fork(). On fork() the duplicated
FD is cloned into the child and thus connection is closed nobody knows
when. Meantime, libvirtd might continue its execution (e.g. starting
another domain from another thread) and call
virSecurityManagerMetadataLock(). However, before it gets to call
Unlock() the child process decides to close the FD (either via exec() or
exit()). Given virtlockd behaviour described couple of paragraphs above
I guess it is clear to see now that virtlockd kills libvirtd.

My solution to this is to always run secdriver transactions from a
separate child process because on fork() only the thread that calls it
is cloned. So only the thread that runs the transaction is cloned, not
the one that is starting a different domain. Therefore no other fork()
can occur here and therefore we are safe.

I know, I know, it's complicated. But it always is around fork() and
IPC.

If you want to see my fix in action, just enable metadata locking, and
try to start two or more domains in a loop:

  for ((i=0;i<1000;i++)); do \
      virsh start u1604-1 &  virsh start u1604-2 &\
      sleep 3;\
      virsh destroy u1604-1; virsh destroy u1604-2;\
  done

At some point you'll see virsh reporting I/O error (this is because
virtlockd killed libvirtd). With my patch, I run the test multiple times
without any hiccup.

@Bjoern: I'm still unable to reproduce the issue you reported. However,
whilst trying to do so I've came across this bug. However, my gut
feeling is that this might help you.

Michal Prívozník (4):
  security: Grab a reference to virSecurityManager for transactions
  virNetSocket: Be more safe with fork() around virNetSocketDupFD()
  virLockManagerLockDaemonAcquire: Duplicate client FD with CLOEXEC flag
  security: Always spawn process for transactions

 src/libxl/libxl_migration.c     |  4 ++--
 src/locking/domain_lock.c       | 18 ++++++++++++++++++
 src/locking/lock_driver_lockd.c |  2 +-
 src/qemu/qemu_migration.c       |  2 +-
 src/rpc/virnetclient.c          |  5 +++--
 src/rpc/virnetclient.h          |  2 +-
 src/rpc/virnetsocket.c          |  7 ++-----
 src/rpc/virnetsocket.h          |  2 +-
 src/security/security_dac.c     | 17 +++++++++--------
 src/security/security_selinux.c | 17 +++++++++--------
 10 files changed, 47 insertions(+), 29 deletions(-)

-- 
2.16.4

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list