On Mon, 2005-08-01 at 12:03 -0400, Dan Williams wrote: > On Mon, 1 Aug 2005, Paul Howarth wrote: > > I see that a number of jobs have now made it into the queue, including > > both of my requests (and some duplicates from other people too). I tried > > killing one of my duplicate jobs about 20 minutes ago by doing: > > > > $ plague-client kill 282 > > > > Shortly afterwards I received an email stating that the job had been > > killed. However, the page > > http://buildsys.fedoraproject.org/build-status/job.psp?uid=282 still > > shows that job as "building" and in fact the plague-client command has > > still not exited. This doesn't seem right... > > It appears that (as of last night) the build server was stuck in SSL_BIO_read() > trying to receive data from hammer3. I killed the hammer3 plague-builder > process, but the server didn't notice that because it was stuck in that > function. > > Now the fix for this is to use socket timeouts, which essentially make the > sockets non-blocking, but this leads to other problems (ie, socket.makefile() > doesn't work well with socket.settimeout(), but we have to use makefile because > the SSL sockets don't have a dup2()) that need to be dealt with as well. I hope > that I can come up with some non-blocking solution here to deal with these > issues. The worst thing is that these problems are completely non-reproducible > and occur at random. > > The immediate solution is to restart the build server. Thanks; that caused my original "plague-client kill 282" to produce a traceback. I was then able to repeat the command, and it exited quickly with: $ plague-client kill 282 Success: job 282 killed. This time it does seem to have worked. There are still duplicates of gkrellm-freq-0_1_1-2_fc4 and dejavu-fonts-1_12-1_fc3 in the queue, so the owners of those packages may want to kill the dupes. Paul. -- Paul Howarth <paul@xxxxxxxxxxxx>