On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik <mprivozn@xxxxxxxxxx> wrote: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: >> Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-19, 11:45AM +0200]: >>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>>> Bjoern Walk <bwalk@xxxxxxxxxxxxx> [2018-09-12, 01:17PM +0200]: >>>>> Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-12, 11:32AM >>>>> +0200]: >>> >>>>> >>>> >>>> Still seeing the same timeout. Is this expected behaviour? >>>> >>> >>> Nope. I wonder if something has locked the path and forgot to >>> unlock it (however, virtlockd should have unlocked all the paths >>> owned by PID on connection close), or something is still holding >>> the lock and connection opened. >>> >>> Can you see the timeout even when you turn off the selinux driver >>> (security_driver="none' in qemu.conf)? I tried to reproduce the >>> issue yesterday and was unsuccessful. Do you have any steps to >>> reproduce? >> >> So, I haven't been able to actually dig into the debugging but we >> have been able to reproduce this behaviour on x86 (both with SELinux >> and DAC). Looks like a general problem, if a problem at all, because >> from what I can see in the code, the 60 seconds timeout is actually >> intended, or not? The security manager does try for 60 seconds to >> acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. > >> >> However, an actual bug is that while the security manager waits for >> the lock acquire the whole libvirtd hangs, but from what I understood >> and Marc explained to me, this is more of a pathological error in >> libvirt behaviour with various domain locks. >> > > Whole libvirtd shouldn't hang. The main loop doesn’t hang. > Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Oh sry, my assumption that the thread A is also holding the virDomainObjList lock is wrong! Here is the actual backtrace of a waiting thread (in this case the “worker thread for the "virsh list" command”) (gdb) bt #0 0x00007f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f850c26cfd9 in virMutexLock (m=<optimized out>) at util/virthread.c:89 #3 0x00007f850c2455df in virObjectLock (anyobj=<optimized out>) at util/virobject.c:429 #4 0x00007f850c2ceaba in virDomainObjListFilter (list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=flags@entry=1) at conf/virdomainobjlist.c:919 #5 0x00007f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=1) at conf/virdomainobjlist.c:961 #6 0x00007f850c2d01a6 in virDomainObjListExport (domlist=<optimized out>, conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=<optimized out>, flags=<optimized out>) at conf/virdomainobjlist.c:1042 #7 0x00007f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493 #8 0x000055a82b2d905d in remoteDispatchConnectListAllDomains (server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:1372 #9 remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, client=<optimized out>, msg=0x55a82ca53380, rerr=0x7f84f47a9960, args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at remote/remote_daemon_dispatch_stubs.h:1348 #10 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 … virDomainObjListFilter needs the lock of every domain… That’s the actual problem. > > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, > ...) and entering critical section. If another thread does open() + > close() on the same file the file is unlocked. Because we can't > guarantee this will not happen in multithreaded libvirt we had to > introduce a separate process to take care of that. And this process has > to be single threaded so there won't ever be the second thread to call > close() and unintentionally release the lock. > > Perhaps we could use OFD locks but those are not part of POSIX and are > available on Linux only. > -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list