Re: Quering since when a PG is inactive

Wido den Hollander <wido@xxxxxxxx> · Wed, 9 Dec 2015 17:14:21 +0100

On 12/09/2015 02:50 PM, Sage Weil wrote:
> Hi Wido!
> 
> On Wed, 9 Dec 2015, Wido den Hollander wrote:
>> Hi,
>>
>> I'm working on a patch in PGMonitor.cc that sets the state to HEALTH_ERR
>> if >= X PGs are stuck non-active.
>>
>> This works for me now, but I would like to add a timer that a PG has to
>> be inactive for more than Y seconds.
>>
>> The PGMap contains "last_active" and "last_clean", but these timestamps
>> are never updated. So I can't query for last_active =< (now() - 300) for
>> example.
>>
>> On a idle test cluster I have a PG for example:
>>
>> "last_active": "2015-12-09 02:32:31.540712",
>>
>> It's currently 08:53:56 here, so I can't check against last_active.
>>
>> What would a good way be to see for how long a PG has been inactive?
> 
> It sounds like maybe the current code is subtley broken:
> 
> 	https://github.com/ceph/ceph/blob/master/src/osd/PG.cc#L2566
> 
> The last_active/clean etc should be fresh within 
> osd_pg_stat_report_interval_max seconds...
> 

Indeed, that seems broken. I created a issue for it:
http://tracker.ceph.com/issues/14028

I'm not sure where to start (yet).

> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html