Hi Wido! On Wed, 9 Dec 2015, Wido den Hollander wrote: > Hi, > > I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR > if >= X PGs are stuck non-active. > > This works for me now, but I would like to add a timer that a PG has to > be inactive for more than Y seconds. > > The PGMap contains "last_active" and "last_clean", but these timestamps > are never updated. So I can't query for last_active =< (now() - 300) for > example. > > On a idle test cluster I have a PG for example: > > "last_active": "2015-12-09 02:32:31.540712", > > It's currently 08:53:56 here, so I can't check against last_active. > > What would a good way be to see for how long a PG has been inactive? It sounds like maybe the current code is subtley broken: https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566 The last_active/clean etc should be fresh within osd_pg_stat_report_interval_max seconds... sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html