On 9/21/18 5:29 AM, Michal Privoznik wrote: > There is this latent bug which can result in virtlockd killing > libvirtd. The problem is when the following steps occur: > > Parent | Child > ------------------------------------------------------+---------------- > 1) virSecurityManagerMetadataLock(path); | > This results in connection FD to virtlockd to be | > dup()-ed and saved for later use. | > | > 2) Another thread calling fork(); | > This results in the FD from step 1) to be cloned | > | > 3) virSecurityManagerMetadataUnlock(path); | > Here, the FD is closed, but the connection is | > still kept open because child process has | > duplicated FD | > | > 4) virSecurityManagerMetadataLock(differentPath); | > Again, new connection is opened, FD is dup()-ed | > | > 5) | exec() or exit() > > Step 5) results in closing the connection from 1) to be closed, > finally. However, at this point virtlockd looks at its internal > records if PID from 1) does not still own any resources. Well it > does - in step 4) it locked differentPath. This results in > virtlockd killing libvirtd. > > Note that this happens because we do some relabelling even before > fork() at which point vm->pid is still -1. When this value is > passed down to transaction code it simply runs the transaction > in parent process (=libvirtd), which means the owner of metadata > locks is the parent process. > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> Always good to double up on the S-o-b's when there's locks and forks involved. That makes sure the fork gets one too ;-) > --- > src/security/security_dac.c | 12 ++++++------ > src/security/security_selinux.c | 12 ++++++------ > 2 files changed, 12 insertions(+), 12 deletions(-) > I read the cover, it's not simpler than the above. It also indicates that patches 1-3 don't have any bearing. So to me that says - 2 series, one the artsy/fartsy cleanup variety and the second this edge/corner condition chase based on timing and/or other interactions. My first question is - have you bisected your changes? Since this only seems to matter for metadata locking, you don't necessarily have to start too early, but I would assume that prior to commit 8b8aefb3 there is no issue. My second question is - given my analysis in patch2, was testing of this patch done with or without patch 2 and 3? Hard to review this without having my patch2 analysis rattling around in short term memory that works. At this point, I'm thinking they were in place and may have had an impact especially since that dup() call was removed, but it's mentioned during this commit message, which is really confusing. Prior to patch2/3, virNetSocketDupFD has 4 callers of which only one calls with 'false' and that's the path having problems. libxlMigrateDstReceive (using true) libxlDomainMigrationSrcPerform (using true) qemuMigrationSrcConnect (using true) virNetClientDupFD (using false) Since your testing certainly doesn't include the first 3, I'll focus on the last one which has but one caller: virLockManagerLockDaemonAcquire meaning without patches 2-3, we got a "dup()" socket back instead of changing the existing socket to F_DUPFD_CLOEXEC. I think I just talked myself in a circle, but perhaps gives you an understanding of the struggle with all the patches in one series. > diff --git a/src/security/security_dac.c b/src/security/security_dac.c > index 5aea386e7c..876cca0f2f 100644 > --- a/src/security/security_dac.c > +++ b/src/security/security_dac.c > @@ -561,12 +561,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, > goto cleanup; > } > > - if ((pid == -1 && > - virSecurityDACTransactionRun(pid, list) < 0) || > - (pid != -1 && > - virProcessRunInMountNamespace(pid, > - virSecurityDACTransactionRun, > - list) < 0)) > + if (pid == -1) > + pid = getpid(); > + > + if (virProcessRunInMountNamespace(pid, > + virSecurityDACTransactionRun, > + list) < 0) Going back to commit d41c1621, it essentially states: "To differentiate whether transaction code should fork() or not the @pid argument now accepts -1 (which means do not fork)." This commit then backtracks or undoes that by using a getpid() causing every transaction to fork. The "logic" for the decision to pass pid == -1 or not is introduced in the subsequent commit 37131ada based on qemuDomainNamespaceEnabled. Thus with this patch every domain w/ or w/o namespace and daemon commit path will fork. So if this logic is to stay, then there's no reason to have the namespace check either, is there? Is the vm->pid different than getpid()? IOW: I think this patch is too big of a leap. However, the contention from the cover is that this issue only happens when metadata locking is enabled. Thus, rather than getpid() for everyone, maybe it's a getpid() for metadata lock consumers. Seems for pid == -1 we could query @mgr for what it knows, like virSecurityManagerGetPrivateData and then if priv->type == VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON, use getpid(). Of course then @mgr loses its ATTRIBUTE_UNUSED. Furthermore, if virLockManagerLockDaemonAcquire knows that we have a daemon type lock, would that virNetClientDupFD need to be altered to use @true instead of @false? Maybe I'm over thinking this part - but with patch2 and patch3 adjustments, it's just not clear in my head anymore... John > goto cleanup; > > ret = 0; > diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c > index 31e42afee7..1e23d776ff 100644 > --- a/src/security/security_selinux.c > +++ b/src/security/security_selinux.c > @@ -1102,12 +1102,12 @@ virSecuritySELinuxTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED, > goto cleanup; > } > > - if ((pid == -1 && > - virSecuritySELinuxTransactionRun(pid, list) < 0) || > - (pid != -1 && > - virProcessRunInMountNamespace(pid, > - virSecuritySELinuxTransactionRun, > - list) < 0)) > + if (pid == -1) > + pid = getpid(); > + > + if (virProcessRunInMountNamespace(pid, > + virSecuritySELinuxTransactionRun, > + list) < 0) > goto cleanup; > > ret = 0; > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list