Re: Quering since when a PG is inactive

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 9 Dec 2015 05:50:44 -0800 (PST)

Hi Wido!

On Wed, 9 Dec 2015, Wido den Hollander wrote:
> Hi,
> 
> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
> if >= X PGs are stuck non-active.
> 
> This works for me now, but I would like to add a timer that a PG has to
> be inactive for more than Y seconds.
> 
> The PGMap contains "last_active" and "last_clean", but these timestamps
> are never updated. So I can't query for last_active =< (now() - 300) for
> example.
> 
> On a idle test cluster I have a PG for example:
> 
> "last_active": "2015-12-09 02:32:31.540712",
> 
> It's currently 08:53:56 here, so I can't check against last_active.
> 
> What would a good way be to see for how long a PG has been inactive?

It sounds like maybe the current code is subtley broken:

	https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566

The last_active/clean etc should be fresh within 
osd_pg_stat_report_interval_max seconds...

sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html