Lost Journals for XFS OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tonight an old Ceph cluster we run suffered a hardware failure that resulted in the loss of Ceph journal SSDs on 7 nodes out of 36. Overview of this old setup:

- Super-old Ceph Dumpling v0.67
- 3x replication for RBD w/ 3 failure domains in replication hierarchy
- OSDs on XFS on spinning disks with Journals on SSD

In total we lost 7 SSDs hosting Journals for 21 OSDs (3 each). The lost nodes span all three failure domains which makes me nervous that there are likely missing Placement Groups in the pool. Due to how Ceph shards data across the Placement Groups, I'm concerned I may have lost all the RBD volumes in this pool.

The obvious solution is to attempt to bring the OSDs back online (for at least one failure domain) to ensure there is at least one complete copy of the data then rebuild everything else. The issue is I lost the journals when the SSDs died.

I don't see much published about recovering OSDs in the event of a lost journal except:

https://ceph.io/geen-categorie/ceph-recover-osds-after-ssd-journal-failure/

And that doesn't mention if the data is valid afterwards. I think I recall Inktank used to deal with this situation and may have had a potential solution. At this point, I'll take any constructive advice.

Thanks you in advance,
Mike
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux