(Ceph Octopus) Repairing a neglected Ceph cluster - Degraded Data Reduncancy, all PGs degraded, undersized, not scrubbed in time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I've inherited a Ceph Octopus cluster that seems like it needs urgent maintenance before data loss begins to happen. I'm the guy with the most Ceph experience on hand and that's not saying much. I'm experiencing most of the ops and repair tasks for the first time here.

Ceph health output looks like this:

HEALTH_WARN Degraded data redundancy: 3640401/8801868 objects degraded (41.359%),
 128 pgs degraded, 128 pgs undersized; 128 pgs not deep-scrubbed in time;
 128 pgs not scrubbed in time

Ceph -s output: https://termbin.com/i06u

The crush rule 'cephfs.media' is here: https://termbin.com/2klmq

So, it seems like all PGs are in a 'warning' state for the main pool, which is erasure coded and 11TiB across 4 OSDs, of which around 6.4TiB is used. The Ceph services themselves seem happy, they're stable and have Quorum. I'm able to access the web panel fine also.  The block devices are of different sizes and types (2 large, different sized spinners, and 2 identical SSDs)

I would welcome any pointers on what my steps to bring this up to full health may be.  If it's undersized, can I simply add another block device/OSD? Or perhaps adjusting config somewhere will get it to rebalance successfully? (the rebalance jobs have been stuck at 0% for weeks)

Thank you for your time reading this message.
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux