Hi Mike,
El 13/4/22 a las 9:29, PenguinOS escribió:
Hello,
My Ceph cluster with 3 nodes is showing a HEALTH_ERR, with the
following errors:
* OSD_SCRUB_ERRORS: 6 scrub errors
* PG_DAMAGED: Possible data damage: 6 pgs inconsistent
* CEPHADM_FAILED_DAEMON: 3 failed cephadm daemon(s)
* MON_CLOCK_SKEW: clock skew detected on mon.ceph3
* MON_DOWN: 1/3 mons down, quorum ceph2,ceph3
* OSD_NEARFULL: 4 nearfull osd(s)
* PG_NOT_DEEP_SCRUBBED: 2 pgs not deep-scrubbed in time
All OSDs (18) are up though,and I don't see any error in each server's
dmesg logs for hard drive issues.
Current cluster's status page is showing: Scrubbing: Active
Is the problem recoverable?
I suggest you first fix mon issues:
- Sync time of nodes with NTP, so that MON_CLOCK_SKEW disappears
- Fix down monitor (mon.ceph1?)
- Not sure about that CEPHADM_FAILED_DAEMON, never used ceph adm, but
doesn't look good.
Then I would look into nearfull OSDs, see if you can free some use or
rebalance OSDs so OSD_NEARFUL is gone.
After that check scrub/inconsistent errors, look other emails in list
archive for this.
Cheers
Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project
Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun
https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx