Slack bot for Ceph

David Turner <drakonstein@xxxxxxxxx> · Thu, 31 May 2018 13:01:28 -0400

https://github.com/drakonstein/cephbot

This is something that I've been using and working on for a while.  My Python abilities are subpar at best, but this has been very useful for me in my environments.  I use it for my home cluster and for multiple clusters at work.  The biggest gain from this is being able to check the status of a cluster when a page goes out that something is wrong.  You can simply ask cephbot without needing to pull out a laptop, connect to a VPN, etc, etc.
This is my first git project and my documentation/README is pretty lacking for it's instructions to get going.  If anyone is interested trying out the instructions and letting me know what's missing or submitting a PR with better instructions after getting it working, that would be greatly appreciated. :)

Here are a few examples of cephbot in action.  Note that I have multiple instances running using the same slack bot token and ID and that I'm using CLUSTER_GROUP'ings to make checking multiple clusters at once easy.

David Turner [12:20 PM]
@cephbot help
cephbot APP [12:20 PM]
c3: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, fs dump, pool io
f1: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, fs dump, pool io
c1: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, pool io
c5: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, pool io
f3: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, fs dump, pool io
f2: status, health, io, osd stat, mon stat, pg stat, down osds, blocked requests, df, osd df, fs dump, pool io

David Turner [12:21 PM]
@cephbot prod health
cephbot APP [12:21 PM]
f3: HEALTH_OK
f2: HEALTH_OK
c3: HEALTH_WARN noout flag(s) set; 16 osds down; 2 hosts (16 osds) down; Degraded data redundancy: 163014044/3748375787 objects degraded (4.349%), 2062 pgs unclean, 2062 pgs degraded, 2062 pgs undersized
c5: HEALTH_OK

David Turner [12:21 PM]
@cephbot c3 down osds
cephbot APP [12:21 PM]
ssd
   sto5-ssd
       osd.215
default
   sto5
       osd.60, osd.61, osd.62, osd.63, osd.64, osd.65, osd.66, osd.67, osd.68, osd.69, osd.70, osd.71, osd.72, osd.73, osd.74

David Turner [12:24 PM]
@cephbot prod io
cephbot APP [12:24 PM]
c3: client:   49005 kB/s rd, 59702 kB/s wr, 337 op/s rd, 248 op/s wr
f2: client:   85 B/s rd, 8610 B/s wr, 0 op/s rd, 3 op/s wr
f3: client:   58876 B/s rd, 151 kB/s wr, 3 op/s rd, 18 op/s wr
c5: nothing is going on

David Turner [12:28 PM]
@cephbot stage osd stat
cephbot APP [12:28 PM]
c1: 25 osds: 25 up, 25 in
f1: 12 osds: 12 up, 12 in
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com