Re: Ceph v.15.2.15 (Octopus, stable) - OSD_SCRUB_ERRORS: 6 scrub errors

Eneko Lacunza <elacunza@xxxxxxxxx> · Wed, 13 Apr 2022 09:38:30 +0200

Hi Mike,

El 13/4/22 a las 9:29, PenguinOS escribió:
Hello,

My Ceph cluster with 3 nodes is showing a HEALTH_ERR, with the 
following errors:

 * OSD_SCRUB_ERRORS: 6 scrub errors
 * PG_DAMAGED: Possible data damage: 6 pgs inconsistent
 * CEPHADM_FAILED_DAEMON: 3 failed cephadm daemon(s)
 * MON_CLOCK_SKEW: clock skew detected on mon.ceph3
 * MON_DOWN: 1/3 mons down, quorum ceph2,ceph3
 * OSD_NEARFULL: 4 nearfull osd(s)
 * PG_NOT_DEEP_SCRUBBED: 2 pgs not deep-scrubbed in time

All OSDs (18) are up though,and I don't see any error in each server's 
dmesg logs for hard drive issues.

Current cluster's status page is showing: Scrubbing: Active

Is the problem recoverable?

I suggest you first fix mon issues:

- Sync time of nodes with NTP, so that MON_CLOCK_SKEW disappears

- Fix down monitor (mon.ceph1?)

- Not sure about that CEPHADM_FAILED_DAEMON, never used ceph adm, but 
doesn't look good.

Then I would look into nearfull OSDs, see if you can free some use or 
rebalance OSDs so OSD_NEARFUL is gone.

After that check scrub/inconsistent errors, look other emails in list 
archive for this.

Cheers

Eneko Lacunza
Zuzendari teknikoa | Director técnico
Binovo IT Human Project

Tel. +34 943 569 206 |https://www.binovo.es
Astigarragako Bidea, 2 - 2º izda. Oficina 10-11, 20180 Oiartzun

https://www.youtube.com/user/CANALBINOVO
https://www.linkedin.com/company/37269706/
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx