On 9/27/18 4:51 AM, Michal Privoznik wrote: > On 09/26/2018 11:14 PM, John Ferlan wrote: >> [...] >>>> I read the cover, it's not simpler than the above. >>> >>> Yeah, this is not that simple bug. I'll try to explain it differently: >>> >>> On fork(), all FDs of the parent are duplicated to child. So even if >>> parent closes one of them, the connection is not closed until child >>> closes the same FD. And since metadata locking is very fragile when it >>> comes to connection to virtlockd, this fork() behaviour actually plays >>> against us because it violates our assumption that connection is closed >>> at the end of metadataUnlock(). In fact it is not - child has it still open. >>> >>> If there was a flag like DONT_DUP_ON_FORK I could just set it on >>> virtlockd connection in metadataLock() and be done. >> >> Avoid the forking exec's works too ;-} >> >> OK - instead of talking in generalities - which forking dup'd fd is in >> play here? I don't want to assume or guess any more. > > The connection FD. > > Maybe I'm misreading your comments or the other way around. I'll try to > explain what is the problem one last time, in the simplest way I can. > > 1) Thread A wants to start a domain so it calls > virSecurityManagerMetadataLock(). > > 2) The function opens a connection to virtlockd (represented by FD) and > duplicates the FD so that connection is kept open until unlock. Yep, virLockManagerLockDaemonAcquire calls virNetClientDupFD(false) meaning I don't want one with the "FD_CLOEXEC" flag set. While normally that would be OK - is it OK for this path? Does anything change if the argument was true here (e.g. cloexec = priv->type == VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON)? It seems to me that the 3 patches together dance around two things - that the CLOEXEC is set and that TransactionCommit does the fork to avoid the parent/child duplication problem. > > 3) Now that locks are acquired, the thread does chown() and setfcon(). > > 4) the virSecurityManagerMetadataUnlock() function is called, which > causes virtlockd to release all the locks. Subsequently to that, the > function closes the duplicated FD obtained in step 2). > > The duplication is done because both virLockManagerAcquire() and > virLockManagerRelease() open new connection, do the RPC talk and close > the connection. That's just the way the functions are written. Now, > because of FD duplication happening in step 2) the connection from > virLockManagerAcquire() is not really closed until step 4) where the > duplicated FD is closed. > > This is important to realize: the connection is closed only when the > last FD representing it is closed. If I were to duplicate the connection > FD a hundred times and then close 99 of those duplicates the connection > would still be kept open. I've understood all this. > > Now, lets put some threads into the picture. Imagine thread B which > called fork() just in between steps 2) and 3). On fork, ALL partent FDs > are duplicated. So the connection FD is duplicated into the child > process too. Therefore, in step 4) where we think the connection is > closed - well it is not really because there is still one FD around - in > the child. This is exactly where the story gets cloudy for me. Is ThreadB related to ThreadA? It must be, because how would it get the duplicated fd, right? But ironically we save priv->clientfd, so if we know we've forked a child can we immediately close it? Knowing that our parent still has it open? Off in left field again? > > And this is huge problem, because we do not know when child is going to > close it. It may be (unintentionally) an evil child and close the FD at > the worst time possible - when libvirtd holds some locks. > > Important thing to realize #2: whenever a connection closes, virtlockd > looks into its internal records and if PID on the other side held some > locks then: a) the locks are releasd, b) the PID is killed. > > I'm sorry, I can't explain this any simpler. I'll try to find somebody > who understands this and have them review the patch. I think sometimes it's better to see actual calls and traces rather than using "imagine" if type scenarios. I'm sure this is *a lot* easier when you have a couple of debug sessions running and see things happening in real time. If someone else reviews this it doesn't matter to me, but I think this patch is wrong for a couple of reasons. That reasoning doesn't seem to be well received. If this patch was accepted as is, then you'd be calling virProcessRunInMountNamespace even when namespaces weren't being used, even for domain locks. Something I saw in the traces that Marc posted this morning and starting thinking, hey wait this code is only supposed to be run when namespaces are enabled. So I have to assume that's being used in his environment. Would the same problem exist if namespaces weren't being used? > > As an alternative approach, if we did not close the connection in 2) and Did you mean clone or close? > 4) we don't have to care when the child closes the FD because it will > not cause the connection to actually close. I wrote a patch like that > for previous versions but it was disliked. Maybe I can write it from > scratch again and have that one ACKed instead. > >> [...] >>> >>> I don't think there is any other way to prevent fork() cloning an FD. >>> >> >> OpenVMS (hahaha) > > I don't think we want to revert all metadata locking patches only to > implement support for different beast. Also, we don't need distributed > locking. > > Michal > I wasn't serious hence the (hahaha), but quite frankly what I know about dlm I believe it would certainly help, but that's not the point here. I do recall though always having some issue with fd locking in various projects I've been involved with. John -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list