On Thu, Feb 23, 2017 at 09:49:21PM +0000, Scottix wrote: > ceph version 10.2.5 (c461ee19ecbc0c5c330aca20f7392c9a00730367) > > We are seeing a weird behavior or not sure how to diagnose what could be > going on. We started monitoring the overall_status from the json query and > every once in a while we would get a HEALTH_WARN for a minute or two. > > Monitoring logs. > 02/23/2017 07:25:54 AM HEALTH_OK > 02/23/2017 07:24:54 AM HEALTH_WARN > 02/23/2017 07:23:55 AM HEALTH_OK > 02/23/2017 07:22:54 AM HEALTH_OK > ... > 02/23/2017 05:13:55 AM HEALTH_OK > 02/23/2017 05:12:54 AM HEALTH_WARN > 02/23/2017 05:11:54 AM HEALTH_WARN > 02/23/2017 05:10:54 AM HEALTH_OK > 02/23/2017 05:09:54 AM HEALTH_OK > > When I check the mon leader logs there is no indication of an error or > issues that could be occuring. Is there a way to find what is causing the > HEALTH_WARN? By leader logs, do you mean the cluster log (mon_cluster_log_file), or the mon log (log_file)? Eg /var/log/ceph/ceph.log vs /var/log/ceph/ceph-mon.$ID.log. Could you post the log entries for a time period between two HEALTH_OK states with a HEALTH_WARN in the middle? The reason for WARN _should_ be included on the logged status line. Alternatively, you should be able to just log the output of 'ceph -w' for a while, and find the WARN status as well. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer E-Mail : robbat2@xxxxxxxxxx GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com