On Fri, 16 Apr 2010 13:16:56 -0400 Jeff Garzik <jeff@xxxxxxxxxx> wrote: > Build #1 (fails on x86_64): > http://koji.fedoraproject.org/koji/taskinfo?taskID=2119825 I think current tabled is much better; it should not stumble with the "100s" thing as much. Unfortunately, it's not completely reliable still. I see this (although very infrequently): PASS: prep-db chunkd[19052]: Waiting for CLD PortFile cld.port cld[19051]: databases up cld[19051]: Listening on port 56141 cld[19051]: initialized: nodebug chunkd[19052]: Using CLD port 56141 tabled[19055]: Listening on port 44610 tabled[19055]: New CLD session created, sid 4C7619861D42473D tabled[19055]: /chunk-default: open failed, retrying chunkd[19053]: Listening on auto port 48660 PASS: start-daemon PASS: pid-exists PASS: daemon-running tabled[19055]: /chunk-default: open failed, retrying tabled[19055]: /chunk-default: open failed, retrying tabled[19055]: /chunk-default: open failed, retrying tabled[19055]: /chunk-default: open failed, retrying <------------ at this point tabled exits cld[19051]: session timeout, addr ::1 sid 4C7619861D42473D chunkd[19053]: New CLD session created, sid 4C7619861D42473D chunkd[19053]: initialized <------------ great, too late ^Cmake[2]: *** [check-TESTS] Interrupt So, tabled retries, but gives up too early. Of course the knee-jerk reaction would be to change the max retries from 5 to 10... The problem is I have a vague suspicion that something is fishy. The root of the 100s problem was that CLD gets delayed just a tiny bit, enough for clients to start and fail the first round of sessions. That's fine, we deal with it now. But in the above log CLD seems to be available enough for tabled to initiate at least, so why does Chunk have to retry? -- Pete -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html