We are already gathering the Ceph admin socket stats with the Diamond plugin and sending that to graphite, so I guess I just need to look through that to find what I'm looking for. ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Mar 13, 2020 at 4:48 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote: > Yeah the removal of that was annoying for sure. ISTR that one can gather > the information from the OSDs’ admin sockets. > > Envision a Prometheus exporter that polls the admin sockets (in parallel) > and Grafana panes that graph slow requests by OSD and by node. > > > > On Mar 13, 2020, at 4:14 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> > wrote: > > > > For Jewel I wrote a script to take the output of `ceph health detail > > --format=json` and send alerts to our system that ordered the osds based > on > > how long the ops were blocked and which OSDs had the most ops blocked. > This > > was really helpful to quickly identify which OSD out of a list of 100 > would > > be the most probable one having issues. Since upgrading to Luminous, I > > don't get that and I'm not sure where that info went to. Do I need to > query > > the manager now? > > > > This is the regex I was using to extract the pertinent information: > > > > '^(\d+) ops are blocked > (\d+\.+\d+) sec on osd\.(\d+)$' > > > > Thanks, > > Robert LeBlanc > > ---------------- > > Robert LeBlanc > > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx