Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-27, 10:15AM +0200]: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: > > Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-19, 11:45AM +0200]: > >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: > >>> Bjoern Walk <bwalk@xxxxxxxxxxxxx> [2018-09-12, 01:17PM +0200]: > >>>> Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-12, 11:32AM > >>>> +0200]: > >> > >>>> > >>> > >>> Still seeing the same timeout. Is this expected behaviour? > >>> > >> > >> Nope. I wonder if something has locked the path and forgot to > >> unlock it (however, virtlockd should have unlocked all the paths > >> owned by PID on connection close), or something is still holding > >> the lock and connection opened. > >> > >> Can you see the timeout even when you turn off the selinux driver > >> (security_driver="none' in qemu.conf)? I tried to reproduce the > >> issue yesterday and was unsuccessful. Do you have any steps to > >> reproduce? > > > > So, I haven't been able to actually dig into the debugging but we > > have been able to reproduce this behaviour on x86 (both with SELinux > > and DAC). Looks like a general problem, if a problem at all, because > > from what I can see in the code, the 60 seconds timeout is actually > > intended, or not? The security manager does try for 60 seconds to > > acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. I still don't understand why we need a timeout at all. If virtlockd is unable to get the lock, just bail and continue with what you did after the timeout runs out. Is this some kind of safety-measure? Against what? > > > > > However, an actual bug is that while the security manager waits for > > the lock acquire the whole libvirtd hangs, but from what I understood > > and Marc explained to me, this is more of a pathological error in > > libvirt behaviour with various domain locks. > > > > Whole libvirtd shouldn't hang. Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Yes, but from a user perspective, this is still pretty bad and unexpected. libvirt should continue to operate as usual while virtlockd is figuring out the locking. > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? There IS a lock held on the image, from the FIRST domain that we started. The second domain, which is using the SAME image, unshared, runs into the locking timeout. Sorry if I failed to describe this setup appropriately. I am starting to believe that this is expected behaviour, although it is not intuitive. # lslocks -u COMMAND PID TYPE SIZE MODE M START END PATH ... virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 ... > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, > ...) and entering critical section. If another thread does open() + > close() on the same file the file is unlocked. Because we can't > guarantee this will not happen in multithreaded libvirt we had to > introduce a separate process to take care of that. And this process has > to be single threaded so there won't ever be the second thread to call > close() and unintentionally release the lock. Thanks for the explanation, I will have some reading to do to get a better overview of the locking process in Linux. > > Perhaps we could use OFD locks but those are not part of POSIX and are > available on Linux only. > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list