On Thu, Feb 23, 2017 at 9:49 PM, Scottix <scottix@xxxxxxxxx> wrote: > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > We are seeing a weird behavior or not sure how to diagnose what could be > going on. We started monitoring the overall_status from the json query and > every once in a while we would get a HEALTH_WARN for a minute or two. > > Monitoring logs. > 02/23/2017 07:25:54 AM HEALTH_OK > 02/23/2017 07:24:54 AM HEALTH_WARN > 02/23/2017 07:23:55 AM HEALTH_OK > 02/23/2017 07:22:54 AM HEALTH_OK > ... > 02/23/2017 05:13:55 AM HEALTH_OK > 02/23/2017 05:12:54 AM HEALTH_WARN > 02/23/2017 05:11:54 AM HEALTH_WARN > 02/23/2017 05:10:54 AM HEALTH_OK > 02/23/2017 05:09:54 AM HEALTH_OK > > When I check the mon leader logs there is no indication of an error or > issues that could be occuring. Is there a way to find what is causing the > HEALTH_WARN? Possibly not without grabbing more than just the overall status at the same time as you're grabbing the OK/WARN status. Internally, the OK/WARN/ERROR health state is generated on-demand by applying a bunch of checks to the state of the system when the user runs the health command -- the system doesn't know it's in a warning state until it's asked. Often you will see a corresponding log message, but not necessarily. John > Best, > Scott > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com