To me it seems most users probably use prometheus which doesn't have this kind of issue.
Monitor down is also easy as pie, because it's just "num_mon - mon_quorum". But there is also metric mon_outside_quorum which I have always zero and don't really know how it works.
OSD near full will be probably more tricky, you have to use "osd.stat_bytes_used / osd.stat_bytes" and compare it with your own configured value (not metric so not exported) per each OSD.
Or you can just watch general cluster health metric (what you should anyway) and rise general alarm in this case.
M.
On 11. 12. 19 21:18, Mario Giammarco
wrote:
Miroslav replied better for us why "is not so simple" to use math.And osd down was the easiest. How can I calculate:- monitor down- osd near full
?
I do not understand why ceph plugin cannot send to influx all the metrics it has, especially the most useful for creating alarms.
Il giorno mer 11 dic 2019 alle ore 04:58 Konstantin Shalygin <k0ste@xxxxxxxx> ha scritto:
To determine how much osds down you don't need special metric, because you alreadyBut it is very difficult/complicated to make simple queries because, for example I have osd up and osd total but not osd down metric.have osd_up and osd_in metrics. Just use math.
k
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
-- Miroslav Kalina Systems development specialist miroslav.kalina@xxxxxxxxxxxx +420 773 071 848 Livesport s.r.o. Aspira Business Centre Bucharova 2928/14a, 158 00 Praha 5 www.livesport.eu
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com