Re: Blocked / Slow requests in health JSON from Mon/Mgr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 27, 2017 at 10:29 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
> Hi,
>
> For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
>
> There is no item with a int value in the JSON returned by the Monitors.
>
> What would be the easiest way to obtain these values in a Mgr Module?
>
> Or would we need to expand the JSON the MON reports?
>
> I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
>
> Right now you would have to parse a string which isn't very stable.

Kefu has been working on the health checks for slow requests:
https://github.com/ceph/ceph/pull/18614
https://github.com/ceph/ceph/pull/19114

Currently, health checks are very string-ish, but I would really like
them to have more machine-readable stuff (i.e. expand the
health_check_t structure with a generic map to store json-encodable
metadata), and populate that in the same places we generate strings
(e.g. in this instance where PGMap generates the
REQUEST_SLOW/REQUEST_STUCK health checks).

BTW, I'm curious about the use case for thresholding slow requests on
the number of slow requests: wouldn't you want to alert the admin even
if there was only one?  If there are false positives then maybe
mon_osd_warn_op_age is the thing to adjust

John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux