On 12/09/2015 02:50 PM, Sage Weil wrote: > Hi Wido! > > On Wed, 9 Dec 2015, Wido den Hollander wrote: >> Hi, >> >> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR >> if >= X PGs are stuck non-active. >> >> This works for me now, but I would like to add a timer that a PG has to >> be inactive for more than Y seconds. >> >> The PGMap contains "last_active" and "last_clean", but these timestamps >> are never updated. So I can't query for last_active =< (now() - 300) for >> example. >> >> On a idle test cluster I have a PG for example: >> >> "last_active": "2015-12-09 02:32:31.540712", >> >> It's currently 08:53:56 here, so I can't check against last_active. >> >> What would a good way be to see for how long a PG has been inactive? > > It sounds like maybe the current code is subtley broken: > > https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566 > > The last_active/clean etc should be fresh within > osd_pg_stat_report_interval_max seconds... > Indeed, that seems broken. I created a issue for it: http://tracker.ceph.com/issues/14028 I'm not sure where to start (yet). > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Wido den Hollander 42on B.V. Ceph trainer and consultant Phone: +31 (0)20 700 9902 Skype: contact42on -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html