Re: [PATCH v4 00/23] Introduce metadata locking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik <mprivozn@xxxxxxxxxx> wrote:
> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
>> Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-19, 11:45AM +0200]:
>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>>> Bjoern Walk <bwalk@xxxxxxxxxxxxx> [2018-09-12, 01:17PM +0200]:
>>>>> Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-12, 11:32AM
>>>>> +0200]:
>>>
>>>>>
>>>>
>>>> Still seeing the same timeout. Is this expected behaviour?
>>>>
>>>
>>> Nope. I wonder if something has locked the path and forgot to
>>> unlock it (however, virtlockd should have unlocked all the paths
>>> owned by PID on connection close), or something is still holding
>>> the lock and connection opened.
>>>
>>> Can you see the timeout even when you turn off the selinux driver
>>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>>> issue yesterday and was unsuccessful. Do you have any steps to
>>> reproduce?
>>
>> So, I haven't been able to actually dig into the debugging but we
>> have been able to reproduce this behaviour on x86 (both with SELinux
>> and DAC). Looks like a general problem, if a problem at all, because
>> from what I can see in the code, the 60 seconds timeout is actually
>> intended, or not? The security manager does try for 60 seconds to
>> acquire the lock and only then bails out. Why is this?
>
> The ideal solution would be to just tell virlockd "these are the paths I
> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
> the moment all paths are unlocked virtlockd will lock them and libvirtd
> can continue its execution (i.e. chown() and setfcon()). However, we
> can't do this because virtlockd runs single threaded [1] and therefore
> if one thread is waiting for lock to be acquired there is no other
> thread to unlock the path.
>
> Therefore I had to move the logic into libvirtd which tries repeatedly
> to lock all the paths needed. And this is where the timeout steps in -
> the lock acquiring attempts are capped at 60 seconds. This is an
> arbitrary chosen timeout. We can make it smaller, but that will not
> solve your problem - only mask it.
>
>>
>> However, an actual bug is that while the security manager waits for
>> the lock acquire the whole libvirtd hangs, but from what I understood
>> and Marc explained to me, this is more of a pathological error in
>> libvirt behaviour with various domain locks.
>>
>
> Whole libvirtd shouldn't hang.

The main loop doesn’t hang.

> Only those threads which try to acquire
> domain object lock. IOW it should be possible to run APIs over different
> domains (or other objects for that matter). For instance dumpxml over
> different domain works just fine.

Oh sry, my assumption that the thread A is also holding the
virDomainObjList lock is wrong!

Here is the actual backtrace of a waiting thread (in this case the
“worker thread for the "virsh list" command”)

(gdb) bt
#0  0x00007f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f850c26cfd9 in virMutexLock (m=<optimized out>) at util/virthread.c:89
#3  0x00007f850c2455df in virObjectLock (anyobj=<optimized out>) at util/virobject.c:429
#4  0x00007f850c2ceaba in virDomainObjListFilter (list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=flags@entry=1) at conf/virdomainobjlist.c:919
#5  0x00007f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=1) at conf/virdomainobjlist.c:961
#6  0x00007f850c2d01a6 in virDomainObjListExport (domlist=<optimized out>, conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=<optimized out>, flags=<optimized out>) at conf/virdomainobjlist.c:1042
#7  0x00007f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493
#8  0x000055a82b2d905d in remoteDispatchConnectListAllDomains (server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:1372
#9  remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, client=<optimized out>, msg=0x55a82ca53380, rerr=0x7f84f47a9960, args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at remote/remote_daemon_dispatch_stubs.h:1348
#10 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437
…

virDomainObjListFilter needs the lock of every domain… That’s the actual
problem.

>
> Anyway, we need to get to the bottom of this. Looks like something keeps
> the file locked and then when libvirt wants to lock if for metadata the
> timeout is hit and whole operation fails. Do you mind running 'lslocks
> -u' when starting a domain and before the timeout is hit?
>
> Michal
>
> 1: The reason that virtlockd has to run single threaded is stupidity of
> POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
> ...) and entering critical section. If another thread does open() +
> close() on the same file the file is unlocked. Because we can't
> guarantee this will not happen in multithreaded libvirt we had to
> introduce a separate process to take care of that. And this process has
> to be single threaded so there won't ever be the second thread to call
> close() and unintentionally release the lock.
>
> Perhaps we could use OFD locks but those are not part of POSIX and are
> available on Linux only.
>
--
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list




[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux