Re: How to clear Health Warning status?

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Fri, 9 Oct 2020 20:17:00 -0700

* Monitors now have a config option ``mon_osd_warn_num_repaired``, 10 by default.
  If any OSD has repaired more than this many I/O errors in stored data a
  ``OSD_TOO_MANY_REPAIRS`` health warning is generated.

Look at `dmesg` and the underlying drive’s SMART counters.  You almost certainly have a drive that is failing and should be replaced.

In releases prior to Nautilus an unrecovered read error would often cause the OSD to crash, eg. from a drive slipping a bad block. 

— aad

> On Oct 9, 2020, at 4:58 PM, Tecnología CHARNE.NET <tecno@xxxxxxxxxx> wrote:
> 
> Hello!
> 
> Today, I started the morning with a WARNING STATUS on our Ceph cluster.
> 
> 
> # ceph health detail
> 
> HEALTH_WARN Too many repaired reads on 1 OSDs
> 
> [WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 1 OSDs
> 
>     osd.67 had 399911 reads repaired
> 
> 
> I made "ceph osd out 67" and PGs where migrated to another OSDs.
> 
> I stopped the osd.67 daemon, inspected the logs, etc...
> 
> Then I started the daemon and made "# ceph osd in 67".
> 
> OSD started backfilling with some PGs and no other error appeared in the rest of the day, but Warning status still remains.
> 
> Can I clear it? Shoud I remove the osd and start with a new one?
> 
> Thanks in advance for your time!
> 
> 
> Javier.-
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx