I took a quick look at the get_health() methods in the Monitor after our discussion this morning: - OSDMonitor::get_health() looks at the pool stats for a few things; I think these can be safely/easily moved to PGMap::get_health() (so that they will run in ceph-mgr) - Then it'll be an easy change to calculate the health and detail sets in encoding_pending as each OSDMap in published. - MgrStatMonitor is already persisting the mgr health messages. - MDSMonitor is also strictly a fundion of the FSMap so it'd be easy to move to encode_pending. - Monitor::get_health() has some odds and ends we can either leave in place or improve (e.g, time skew checks). Not sure it matters much. My main question is whether you had specific thoughts about how to identify warnings so that we can note when they appear and disappear. We can just go by the unique strings but then you'll see something like 1 osd(s) down ... 1 osd(s) down cleared 2 osd(s) down ... (or whatever we make the messages for cleared warnings look like). Should we associate a 'tag' for each message that is used to identify it, so that, for example, "%d osd down" for any number of OSDs is considered the "same" message and we log when it changes but don't say it has cleared? sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html