Re: Random Health_warn

John Spray <jspray@xxxxxxxxxx> · Thu, 23 Feb 2017 22:47:28 +0000

On Thu, Feb 23, 2017 at 9:49 PM, Scottix <scottix@xxxxxxxxx> wrote:
> ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367)
>
> We are seeing a weird behavior or not sure how to diagnose what could be
> going on. We started monitoring the overall_status from the json query and
> every once in a while we would get a HEALTH_WARN for a minute or two.
>
> Monitoring logs.
> 02/23/2017 07:25:54 AM HEALTH_OK
> 02/23/2017 07:24:54 AM HEALTH_WARN
> 02/23/2017 07:23:55 AM HEALTH_OK
> 02/23/2017 07:22:54 AM HEALTH_OK
> ...
> 02/23/2017 05:13:55 AM HEALTH_OK
> 02/23/2017 05:12:54 AM HEALTH_WARN
> 02/23/2017 05:11:54 AM HEALTH_WARN
> 02/23/2017 05:10:54 AM HEALTH_OK
> 02/23/2017 05:09:54 AM HEALTH_OK
>
> When I check the mon leader logs there is no indication of an error or
> issues that could be occuring. Is there a way to find what is causing the
> HEALTH_WARN?

Possibly not without grabbing more than just the overall status at the
same time as you're grabbing the OK/WARN status.

Internally, the OK/WARN/ERROR health state is generated on-demand by
applying a bunch of checks to the state of the system when the user
runs the health command -- the system doesn't know it's in a warning
state until it's asked.  Often you will see a corresponding log
message, but not necessarily.

John

> Best,
> Scott
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com