Hello,
I have created a small 16pg EC pool with k=4, m=2.
Then I applied following crush rule to it:
rule test_ec { id 99 type erasure min_size 5 max_size 6 step
set_chooseleaf_tries 5 step set_choose_tries 100 step take default
step choose indep 3 type host step chooseleaf indep 2 type osd
step emit }
The OSD tree looks as following:
-1 43.38448 root default
-9 43.38448 region lab1
-7 43.38448 room dc1.lab1
-5 43.38448 rack r1.dc1.lab1
-3 14.44896 host host1.r1.dc1.lab1
6 hdd 3.63689 osd.6
up 1.00000 1.00000
8 hdd 3.63689 osd.8
up 1.00000 1.00000
7 hdd 3.63689 osd.7
up 1.00000 1.00000
11 hdd 3.53830 osd.11
up 1.00000 1.00000
-11 14.44896 host host2.r1.dc1.lab1
4 hdd 3.63689 osd.4
up 1.00000 1.00000
9 hdd 3.63689 osd.9
up 1.00000 1.00000
5 hdd 3.63689 osd.5
up 1.00000 1.00000
10 hdd 3.53830 osd.10
up 1.00000 1.00000
-13 14.48656 host host3.r1.dc1.lab1
0 hdd 3.57590 osd.0
up 1.00000 1.00000
1 hdd 3.63689 osd.1
up 1.00000 1.00000
2 hdd 3.63689 osd.2
up 1.00000 1.00000
3 hdd 3.63689 osd.3
up 1.00000 1.00000
My expectation was that each host will contain 2 shards of any PG of
the pool.
When I dumped PGs, it was true, but one group is placed on OSDs 0,2,3
which will cause downtime in case of host3 failure.
root@host1:~/mkw # ceph pg dump|grep "^66\."|awk '{print $17}'
dumped all
[4,5,7,6,1,2]
[8,11,9,3,0,2] <<< - this one is problematic
[6,7,10,9,2,0]
[2,3,7,6,5,9]
[7,8,10,5,3,1]
[4,5,8,6,0,2]
[7,11,9,4,1,2]
[5,9,0,2,7,11]
[9,5,3,1,7,8]
[8,11,2,0,5,9]
[2,0,8,6,10,9]
[3,2,5,9,7,11]
[6,7,9,5,1,2]
[10,5,1,3,11,8]
[4,5,7,8,2,0]
[7,8,3,2,9,10]
Is there a way to ensure that host failure is not disruptive to the cluster?
During the experiment I used info from this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030227.html
Kind regards,
Maks Kowalik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx