Hi Ceph, Today I did something wrong and that blocked the lab for a good half hour. a) I ran two teuthology-kill simultaneously and that makes them deadlock each other b) I let them run unattended only to come back to the terminal 30 minutes later and see them stuck. Sure, two teuthology-kill simultaneously should not deadlock and that needs to be fixed. But the easy workaround to avoid that trouble is to just not let it run forever. Even for ~200 jobs it takes at most a minute or two. And if it takes longer it probably means another teuthology-kill competes and it should be interrupted and restarted later. From now on I'll do timeout 120 teuthology-kill .... || echo FAIL! as a generic safeguard. Apologies for the troubles. -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature