Copr outage - details

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



tl;dr: On Sunday 23rd February, there will be Copr outage. It will last the whole day.
PPC64LE builder and chroots will be deactivated. The PPC64LE builders should be back in a matter of weeks.

Hi.
As previously announced, Fedora's infrastructure is moving to a different datacenter. For some servers, the move is
trivial. Copr servers are different. Copr build system consists of four servers, plus four staging servers. Eight TB of
repos, four TB of dist-git, and several small volumes.

The original plan was to move to Washington D.C. to IAD2 datacenter by June.  Copr is running in Fedora OpenStack, and
this cloud has to be evacuated by the beginning of March to free an IP range.
The plan was to move Copr to new hardware (thanks to Red Hat) and later move this HW to the new datacenter.  That would
mean two outages, where the second one lasted at least 15 days (!).

We were looking for another option and we found it. We are going to move Copr to Amazon AWS, shutdown old VM on Fedora
Cloud. Move the new HW to IAD2 datacenter and then move Copr from AWS to new HW in IAD2 - FYI, the final destination is
still subject to change. This still means two outages, but they should be just a few hours. And web server with DNF
repositories should be available all the time.
The second outage, will happen in May or June.

Here is a detailed schedule. We are going to update this table during migration. You can watch the progress during
migration:

https://docs.google.com/spreadsheets/d/1jrCgdhseZwi91CTRlo9Y5DNwfl9VHoZfjHPK_pocuf4/edit?usp=sharing

Here is a short abstract:

 * we are doing constant rsync to the new location
 * we spin up staging and production instances in the new location
 * on Sunday morning we stop frontend and therefore accepting new jobs. The backend with DNF repos will still be
operational.
 * we do final rsync (~6 hours)
 * around 13:00 UTC we switch DNS to the new location
 * we then enable all services
 * once we confirm that everything is operational, the outage will be over

There are several caveats:

 * After we enable services on Sunday 13:00 UTC you may see some failures. Be asured that we will swiftly address them.
 * Once we get out of Fedora Cloud, we lost access to PPC64LE builders. We are going to deactivate those chroots just
before the migration. After a few weeks, we should get it back. ETA is unknown. The worst-case scenario is in June 2020.
We will be aiming to bring it back as soon as possible.
 * Any small issue can easily change the schedule by hours. E.g., just simple 'chown -R' on backend runs ~4 hours.

There are going to be three Copr engineers and one fedora-infrastructure member available whole Sunday. If you
experienced a problem, do not hesitate to contact us. We are on #fedora-buildsys on Freenode.

The link to the outage ticket is:
https://pagure.io/fedora-infrastructure/issue/8668

-- 
Miroslav Suchy, RHCARed Hat, Associate Manager ABRT/Copr, #brno, #fedora-buildsys
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Fedora Announce]     [Fedora Users]     [Fedora Kernel]     [Fedora Testing]     [Fedora Formulas]     [Fedora PHP Devel]     [Kernel Development]     [Fedora Legacy]     [Fedora Maintainers]     [Fedora Desktop]     [PAM]     [Red Hat Development]     [Gimp]     [Yosemite News]

  Powered by Linux