Re: How to get num ops blocked per OSD

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Fri, 13 Mar 2020 16:48:50 -0700

Yeah the removal of that was annoying for sure.  ISTR that one can gather the information from the OSDs’ admin sockets.

Envision a Prometheus exporter that polls the admin sockets (in parallel) and Grafana panes that graph slow requests by OSD and by node.

> On Mar 13, 2020, at 4:14 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
> 
> For Jewel I wrote a script to take the output of `ceph health detail
> --format=json` and send alerts to our system that ordered the osds based on
> how long the ops were blocked and which OSDs had the most ops blocked. This
> was really helpful to quickly identify which OSD out of a list of 100 would
> be the most probable one having issues. Since upgrading to Luminous, I
> don't get that and I'm not sure where that info went to. Do I need to query
> the manager now?
> 
> This is the regex I was using to extract the pertinent information:
> 
> '^(\d+) ops are blocked > (\d+\.+\d+) sec on osd\.(\d+)$'
> 
> Thanks,
> Robert LeBlanc
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx