On Fri, Nov 21, 2014 at 1:54 AM, Michael Sevilla <mikesevilla3@xxxxxxxxx> wrote: > Hi. Where do the MDSHealthMetrics in MMDSBeacon (e.g., > MDS_HEALTH_TRIM) show up in the monitors? When we run ceph -s? I > suspect I don't see them because I'd have to run ceph -s at the exact > moment when the MDS is trimming. Is there an easier way to see these > warning or is there some debug flag I need to turn on? In the specific case of MDS_HEALTH_TRIM, this is aimed at detecting systems that are trimming at a pathologically bad rate (or perhaps stuck entirely due to a bug), so the in such an unhealthy system we would expect the state to stick around for a while -- it shouldn't just be a "blink and you miss it" status. However, you would have to look at the status sometime in the unhealthy period: there's currently nothing in the cluster log for that health check. For the new MDS health warnings, we have some overlapping coverage between health indications (i.e. things that show up in "ceph -s") and cluster log messages (i.e. things that show up in "ceph -w"). There is a general problem here for the health stuff (not just for the MDS things) that it is only generated on-demand when someone looks at it -- e.g. things like clock skew also only show up if you happen to run ceph -s at the right moment. Internally this corresponds to the various get_health() functions in the mon subsystems. It would be good to have a generic way for health indicators (MDS and beyond) to emit clog messages when they appear and disappear, so that you don't have to look at the status at the right moment. That would be a little hard to implement at the moment because the health messages are just freeform strings, but I put some notes on cleaning up health reporting here a while back: http://tracker.ceph.com/issues/7192 Cheers, John -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html