On Fri, 27 Jan 2012, Gregory Farnum wrote: > On Fri, Jan 27, 2012 at 1:32 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > Please review. > > > > If the monitor sees an osdmap go by where nodes go down (or up) it will > > scan its pg_map and mark any pg whose primary is down as 'stale'. If/when > > the pg recovers, that will get refreshed. If not, the admin will know > > something is up. > Hmm. Without any kind of timeout this flag will get set every time an > OSD goes down ? the replicas won't alert the new primary until after > they get the map marking their old primary down, and this check will > be run synchronously with the generation of the map marking the OSD > down. > The "spurious" stale marker on each PG isn't a big deal (it'll > disappear after a few seconds), but if we're going to set HEALTH_WARN > based on it, that seems like a bit much to me. My thought is that as soon as we add the time stamps to the state transition, it'll only warn once things are stale for a while. We already have the same problem with degraded/peering/etc with the health checks... sage > > > We'll soon be adding the last_active, last_clean, and now last_unstale (?) > > fields so that bigger alarms can go off when the pg stays stale for more > > than a few seconds... > Yeah; I think we want to use this to trigger big warnings, but not to > trigger warnings without it. > -Greg > > > > > > sage > > -- > > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >