Ceph Community Infrastructure Outage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi everyone,


From November into January, we experienced a series of outages with the Ceph Community Infrastructure and its services:




These services are now mostly restored, but we did experience some data loss, notably in our mailing lists. We have restored them from backups, but subscription changes after July 2021 need to be repeated. If you subscribed or unsubscribed since then, please check your settings with the appropriate list at https://lists.ceph.io. If your posts to our mailing lists are now needing approval, that is also an indication that you need to re-subscribe to the appropriate lists.

Keep an eye out for emails with subject lines such as “Your message to ceph-users@xxxxxxx awaits moderator approval”.


When the community infrastructure was first created in late 2014, the VM cluster management software selected by the team came with the benefit of being widely entrenched and familiar to the lab administrators but didn't support Ceph as a storage backend at the time. As services grew, we relied more and more on its legacy storage solution, which was never migrated to Ceph. Over the last few months, this legacy storage solution had several instances of silent data corruption, rendering the VMs unbootable, taking down various services, and requiring restoration from backups in many cases.


We are moving these services to a more reliable, mostly container-based, infrastructure backed by Ceph, and planning for longer-term improvements to monitoring, backups, deployment, and other pieces of the project infrastructure.


This event highlights the need to better support the infrastructure. A handful of contributors have stepped up to restore these services, but we need an invested team focused.


If you or your company is looking for a great way to contribute to the Ceph community, this could be your opportunity. Please contact council@xxxxxxx if you can provide time to contribute to the Ceph Community Infrastructure and would like to join the team. You can also join the upstream #sepia slack channel to participate in these discussions using this link: https://join.slack.com/t/ceph-storage/shared_invite/zt-1n1eh6po5-PF9sokUSooOf1ZkVdqrPUQ


Unfortunately, these events have slowed down our upstream development and releases. We are currently working on publishing the next Pacific point release. The development freeze and release deadline for the Reef release will likely be pushed out, and more discussions to follow in the Ceph Leadership Team meetings.


- The Ceph Leadership Team
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx

[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux