Re: How to clear Health Warning status?

Tecnología CHARNE.NET <tecno@xxxxxxxxxx> · Sat, 10 Oct 2020 08:45:05 -0300

Thanks, Anthony, for your quick response.

I'll remove the disk and replace it.

Javier.-

El 10/10/20 a las 00:17, Anthony D'Atri escribió:
* Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by default.
   If any OSD has repaired more than this many I/O errors in stored data a
   ``OSD_TOO_MANY_REPAIRS`` health warning is generated.

Look at `dmesg` and the underlying drive’s SMART counters.  You almost certainly have a drive that is failing and should be replaced.

In releases prior to Nautilus an unrecovered read error would often cause the OSD to crash, eg. from a drive slipping a bad block.

— aad

On Oct 9, 2020, at 4:58 PM, Tecnología CHARNE.NET <tecno@xxxxxxxxxx> wrote:

Hello!

Today, I started the morning with a WARNING STATUS on our Ceph cluster.

# ceph health detail

HEALTH_WARN Too many repaired reads on 1 OSDs

[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs

     osd.67 had 399911 reads repaired

I made "ceph osd out 67" and PGs where migrated to another OSDs.

I stopped the osd.67 daemon, inspected the logs, etc...

Then I started the daemon and made "# ceph osd in 67".

OSD started backfilling with some PGs and no other error appeared in the rest of the day, but Warning status still remains.

Can I clear it? Shoud I remove the osd and start with a new one?

Thanks in advance for your time!

Javier.-
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx