On Fri, Jan 27, 2012 at 1:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > Please review. > > If the monitor sees an osdmap go by where nodes go down (or up) it will > scan its pg_map and mark any pg whose primary is down as 'stale'. If/when > the pg recovers, that will get refreshed. If not, the admin will know > something is up. Hmm. Without any kind of timeout this flag will get set every time an OSD goes down — the replicas won't alert the new primary until after they get the map marking their old primary down, and this check will be run synchronously with the generation of the map marking the OSD down. The "spurious" stale marker on each PG isn't a big deal (it'll disappear after a few seconds), but if we're going to set HEALTH_WARN based on it, that seems like a bit much to me. > We'll soon be adding the last_active, last_clean, and now last_unstale (?) > fields so that bigger alarms can go off when the pg stays stale for more > than a few seconds... Yeah; I think we want to use this to trigger big warnings, but not to trigger warnings without it. -Greg > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html