I found root cause on two nasty races today, but ran out of time to write the patches until next week. 1. We have a TOCTTOU race when moving processes between cgroups. virCgroupMoveTask can fail if listing the threads in the parent group picks up multiple threads, then one of those threads exits before we then write to the emulator subgroup to relocate those threads; the chain reaction from this failure is that the domain fails to start. qemu spawns short-lived threads all the time (possibly due to glibc using threads under the hood to implement aio as qemu first probes a disk image); with a 3-second-delay hook script, I was able to hit this race about 5% of the time when repeatedly starting multiple domains in parallel. The fix should be ignoring any ESRCH failures when writing during virCgroupAddTAskStrController, but I plan to test that before posting my patch. Present at least since 0.10.2 (Fedora 18); probably earlier (but I didn't check how far back). 2. We have a (rare) deadlock due to the use of a non-async-signal-safe getpwuid_r in between fork and exec. I was extremely surprised when I hit it in real life while trying to debug the first race; but sure enough, the backtrace confirms that glibc grabs a mutex inside getpwuid_r, and if some other thread in the parent process is interrupted in the middle of getpwuid_r while we are forking a child in our thread, then our child is hung; and since the parent depends on a handshake with the child, the parent's worker thread was also hung. I think the fix to this is to call getpwuid_r prior to fork, and change virSetUIDGID to take the list of supplemental groups as a parameter rather than trying to learn it on the fly. Again, something I plan to test once I have the patch written. Present at least since commit f42cf7cb (Dec 2010) with the introduction of virSetUIDGID, and probably further back. -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list