Michal Privoznik <mprivozn@xxxxxxxxxx> [2018-09-27, 12:07PM +0200]: > On 09/27/2018 11:11 AM, Bjoern Walk wrote: > > I still don't understand why we need a timeout at all. If virtlockd is > > unable to get the lock, just bail and continue with what you did after > > the timeout runs out. Is this some kind of safety-measure? Against what? > > When there are two threads fighting over the lock. Imagine two threads > trying to set security label over the same file. Or imagine two daemons > on two distant hosts trying to set label on the same file on NFS. > virtlockd implements try-or-fail approach. So we need to call lock() > repeatedly until we succeed (or timeout). One thread wins and the other fails with lock contention? I don't see how repeatedly trying to acquire the lock will improve things, but maybe I am not getting it right now. > > There IS a lock held on the image, from the FIRST domain that we > > started. The second domain, which is using the SAME image, unshared, > > runs into the locking timeout. Sorry if I failed to describe this setup > > appropriately. I am starting to believe that this is expected behaviour, > > although it is not intuitive. > > > > # lslocks -u > > COMMAND PID TYPE SIZE MODE M START END PATH > > ... > > virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 > > But this should not clash, because metadata lock use different bytes: > the regular locking uses offset=0, metadata lock uses offset=1. Both > locks lock just one byte in length. It's the only lock in the output for the image. Should I see the metadata lock at start=1 as well? > However, from the logs it looks like virtlockd doesn't try to actually > acquire the lock because it found in internal records that the path is > locked (even though it is locked with different offset). Anyway, now I > am finally able to reproduce the issue and I'll look into this too. Good. > Quick and dirty patch might look something like this: > > diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c > index 79fa48d365..3c51d7926b 100644 > --- i/src/util/virlockspace.c > +++ w/src/util/virlockspace.c > @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr > lockspace, > virMutexLock(&lockspace->lock); > > if ((res = virHashLookup(lockspace->resources, resname))) { > - if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && > - (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { > + if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && > + flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || > + start == 1) { > > if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) > goto cleanup; Had a quick test with this, but now it seems like that the whole metadata locking does nothing. When starting the second domain, I get an internal error from QEMU, failing to get the write lock. The SELinux labels of the image are changed as well. This is the same behaviour as with metadata locking disabled. Not entirely sure what should happen or if this is a error in my setup or not. I will have to think about this. > But as I say, this is very dirty hack. I need to find a better solution. > At least you might have something to test. > > Michal > > -- > libvir-list mailing list > libvir-list@xxxxxxxxxx > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294
Attachment:
signature.asc
Description: PGP signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list