On Thu, 2008-06-26 at 17:16 -0400, Dan Williams wrote: > On Thu, 2008-06-26 at 15:41 -0500, Jason L Tibbitts III wrote: > > >>>>> "JB" == Josh Boyer <jwboyer@xxxxxxxxx> writes: > > > > JB> That might have had a bigger effect. I though koji would only run > > JB> one build job per builder? Or is it per CPU? > > > > I don't know what koji does, but in this case koji was unaware that > > the jobs were still running. I guess they had been killed from the > > server but not cleaned up on the builders. > > This happened a lot with plague too. I think it's Just Hard in *NIX to > ensure that all ancestors of a given task have been killed dead dead > dead. Maybe they somehow get out of the parent's process group, they > are just hung and don't respond to signals, they are in D state when the > signals get sent, whatever. Running craploads of scripts and programs > as part of the build process that fork and exec and do God-knows-what > doesn't lend itself to being cleaned up easily. > > I think either cgroups (?) or putting each build in a clean VM which can > be torn down completely is probably the answer. And out of those two, a > whole new VM would be pretty heavy to create/destroy so it's probably > out of the question. And impossible on ppc without LPAR support for every builder. Hopefully containers will help when the builders move to RHEL6 (if RHEL6 supports containers...). josh -- fedora-devel-list mailing list fedora-devel-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-devel-list