orch approved.
After reruns, the only failed jobs in the orch run were orch/rook tests (which are broken currently) and 2 instances of "Test failure: test_non_existent_cluster". That failure is just a command expecting a zero return code and an error message instead of a nonzero return code in a failure case. I think the test got backported without the change of the error code, either way it's not a big deal.
I also took a brief look at the orchestrator failure from the upgrade tests (https://tracker.ceph.com/issues/59121) that Laura saw. In the instance of it I looked at, It seems like the test is running "orch upgrade start" and then not running "orch upgrade pause" until about 20 minutes later, at which point the upgrade has already completed (and I can see all the daemons got upgraded to the new image in the logs). It looks like it was waiting on a loop to see a minority of the mons had been upgraded before pausing the upgrade, but even starting that loop took over 2 minutes, despite the only actions in between being a "ceph orch ps" call and echoing out a value. Really not sure why it was so slow in running those commands or why it happened 3 times in the initial run but never in the reruns, but the failure came from that, and the upgrade itself seems to still work fine.
- Adam King
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx