On Fri, Jan 13, 2012 at 9:24 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > wip-osd-pg-stats adds a time stamp and epoch number to the pg_stat_t > struct. The epoch is updated when the mapping changes, and the time stamp > is updated when the pg state changes. > > The state timestamp is the interesting one, since it'll make it easier to > identify PGs that are stuck in, say, down or peering states for long > periods of time. Thumbs up! > What it doesn't help with is when the pg state is toggling between two > undesireable states (the stamp will still get updated). In practice, what > we probably care about is active vs not active, and degraded vs not > degraded. We could add additional time stamps for those, but that may be > overkill. I think we'll want these; or at least more than we've got right now. I'm not sure what a good way to track "stuck" PGs from an external tool would be without building a lot of smarts into it (tracking the states a PG can take), while it would be easy to serve them by adding an inactive_stamp and a degraded_stamp. > Previously we had planned on doing this sort of monitoring using an > external agent, but that's kind of a pain, duplicates storage, etc. This > is simple to add into the pg_stat_t update logic. > > I expect this will be rebased on top of the new encoding strategy stuff so > that the object versioning is backward and forwards compatible, so don't > worry about that part here. > > https://github.com/NewDreamNetwork/ceph/commit/f43282796af50d760a620970ad691e0a20bcf178 > > Thoughts? > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html