Re: Falls cluster then one node switch off

Christian Balzer <chibi@xxxxxxx> · Tue, 24 May 2016 14:53:44 +0900

Hello,

On Tue, 24 May 2016 10:28:02 +0700 Никитенко Виталий wrote:

> Hello!
> I have a cluster of 2 nodes with 3 OSD each. The cluster full about 80%.
> 
According to your CRUSH map that's not quite true, namely ceph1-node2
entry.

And while that again according to your CRUSH map isn't in the default root
I wonder WHERE it is and if it confuses Ceph into believing that there is
actually a third node?

"ceph osd tree" output may help, as well as removing ceph1-node2 from the
picture.

> df -H
> /dev/sdc1        27G   24G  3.9G  86% /var/lib/ceph/osd/ceph-1
> /dev/sdd1        27G   20G  6.9G  75% /var/lib/ceph/osd/ceph-2
> /dev/sdb1        27G   24G  3.5G  88% /var/lib/ceph/osd/ceph-0
> 
> When I switch off one server, then after 10 minutes begins remapped pgs
> 
[snip]
> As a result, one disk overflow and the cluster falls. Why ceph remapped
> pgs, it was supposed to simply mark all pgs as active+degraded, while
> second node down?
> 

Yes, I agree, that shouldn't happen with a properly configured 2 node
cluster.

> ceph version 0.80.11
> 
Not aware of any bugs in there and in fact I did test a 2 node cluster
with Firefly, but be aware that this version is EoL and no longer receiving
updates.

> root@ceph1-node:~# cat /etc/ceph/ceph.conf 
> [global]
> fsid = b66c7daa-d6d8-46c7-9e61-15adbb749ed7
> mon_initial_members = ceph1-node, ceph2-node, ceph-mon2
> mon_host = 192.168.241.97,192.168.241.110,192.168.241.123
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> filestore_xattr_use_omap = true
> osd_pool_default_size = 2
> osd_pool_default_min_size = 1

Have you verified (ceph osd get <poolname> size / min_size) that all your
pools are actually set like this?

Regards,

Christian
> mon_clock_drift_allowed = 2
> 
> 
> root@ceph1-node:~#cat crush-map.txt 
> # begin crush
> map tunable choose_local_tries
> 0 tunable choose_local_fallback_tries
> 0 tunable choose_total_tries
> 50 tunable chooseleaf_descend_once
> 1 tunable straw_calc_version
> 1 
> #
> devices device 0
> osd.0 device 1
> osd.1 device 2
> osd.2 device 3
> osd.3 device 4
> osd.4 device 5
> osd.5 
> #
> types type 0
> osd type 1
> host type 2
> chassis type 3
> rack type 4
> row type 5
> pdu type 6
> pod type 7
> room type 8
> datacenter type 9
> region type 10
> root 
> #
> buckets host ceph1-node
> { id -2           # do not change
> unnecessarily # weight
> 0.060 alg
> straw hash 0  #
> rjenkins1 item osd.0 weight
> 0.020 item osd.1 weight
> 0.020 item osd.2 weight
> 0.020 }                                                                                                                                                                                                                                                                              
> host ceph2-node
> { id -3           # do not change unnecessarily
>         # weight 0.060
>         alg straw
>         hash 0  # rjenkins1
>         item osd.3 weight 0.020
>         item osd.4 weight 0.020
>         item osd.5 weight 0.020
> }
> root default {
>         id -1           # do not change unnecessarily
>         # weight 0.120
>         alg straw
>         hash 0  # rjenkins1
>         item ceph1-node weight 0.060
>         item ceph2-node weight 0.060
> }
> host ceph1-node2 {
>         id -4           # do not change unnecessarily
>         # weight 3.000
>         alg straw
>         hash 0  # rjenkins1
>         item osd.0 weight 1.000
>         item osd.1 weight 1.000
>         item osd.2 weight 1.000
> }
> 
> # rules
> rule replicated_ruleset {
>         ruleset 0
>         type replicated
>         min_size 1
>         max_size 10
>         step take default
>         step chooseleaf firstn 0 type host
>         step emit
> }
> # end crush map
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 

-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com