Re: HEALTH_OK when one server crashed?

Christian Balzer <chibi@xxxxxxx> · Fri, 13 Jan 2017 09:21:26 +0900

Hello,

On Thu, 12 Jan 2017 14:35:32 +0000 Matthew Vernon wrote:

> Hi,
> 
> One of our ceph servers froze this morning (no idea why, alas). Ceph
> noticed, moved things around, and when I ran ceph -s, said:
> 
> root@sto-1-1:~# ceph -s
>     cluster 049fc780-8998-45a8-be12-d3b8b6f30e69
>      health HEALTH_OK
>      monmap e2: 3 mons at
> {sto-1-1=172.27.6.11:6789/0,sto-2-1=172.27.6.14:6789/0,sto-3-1=172.27.6.17:6789/0}
>             election epoch 250, quorum 0,1,2 sto-1-1,sto-2-1,sto-3-1
>      osdmap e9899: 540 osds: 480 up, 480 in
>             flags sortbitwise
>       pgmap v4549229: 20480 pgs, 25 pools, 7559 GB data, 1906 kobjects
>             22920 GB used, 2596 TB / 2618 TB avail
>                20480 active+clean
>   client io 5416 kB/s rd, 6598 kB/s wr, 44 op/s rd, 53 op/s wr
> 
> Is it intentional that it says HEALTH_OK when an entire server's worth
> of OSDs are dead? you have to look quite hard at the output to notice
> that 60 OSDs are unaccounted for.
> 
What Wido said.
Though there have been several discussions and nodding of heads that the
current states of Ceph are pitifully limited and for many people simply
inaccurate.
As in, separating them in something like OK, INFO, WARN, ERR and having
configuration options to determine what situation equates what state.

Of course you should be monitoring your cluster with other tools like
nagios, from general availability on all network ports, disk usage, SMART
wear out levels of SSDs down to the individual processes you'd expect to
see running on a node:
"PROCS OK: 8 processes with command name 'ceph-osd' "

I lost single OSDs a few times and didn't notice either by looking at
Nagios as the recovery was so quick. 

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com