As you may know, the Sepia Long Running Cluster has been hitting capacity limits over the past week or so. This has resulted in service disruptions to teuthology runs, chacra.ceph.com, docker-mirror.front.sepia.ceph.com, and quay.ceph.io. We've been able to get by by deleting/compressing logs more aggressively but it's not ideal or sustainable. Patrick has created a new erasure coded pool/filesystem that will allow us to keep the same amount of logs but use less space. In order to have teuthology workers start writing logs to that pool, we need to take an outage. At 0400 UTC 19AUG2020, I will instruct all teuthology workers to die after their running jobs finish. At 1300 UTC, I will kill any jobs that are still running. This gives the lab 9 hours to gracefully shut down. At that point, we will switch the mountpoint on teuthology.front over to the new EC pool and start storing new logs there. At the same time, Patrick will start migrating logs on the existing/old pool to the new pool. This means that logs from 7/20 through 8/19 will be unavailable (you'll see 404s) via the Pulpito web UI and qa-proxy URLs until they're migrated to the new EC pool. Let me know if you have any questions/concerns. Thanks, -- David Galloway Systems Administrator, RDU Ceph Engineering IRC: dgalloway _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx