Re: EC pool 4+2 - failed to guarantee a failure domain

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

this is unexpected, of course, but it can happen if one OSD is full (or also nearfull?). Have you checked 'ceph osd df'? The pg availability has more priority than the placement, so it's possible that during a failure some chunks are recreated on the same OSD or host even if the crush rules shouldn't allow that.

Regards,
Eugen


Zitat von Maks Kowalik <maks_kowalik@xxxxxxxxx>:

Hello,

I have created a small 16pg EC pool with k=4, m=2.
Then I applied following crush rule to it:

rule test_ec {  	id 99  	type erasure  	min_size 5  	max_size 6  	step
set_chooseleaf_tries 5  	step set_choose_tries 100  	step take default
 	step choose indep 3 type host  	step chooseleaf indep 2 type osd
	step emit  }

The OSD tree looks as following:
-1       43.38448 root default
 -9       43.38448     region lab1
 -7       43.38448         room dc1.lab1
 -5       43.38448             rack r1.dc1.lab1
 -3       14.44896                 host host1.r1.dc1.lab1
  6   hdd  3.63689                     osd.6
   up  1.00000 1.00000
  8   hdd  3.63689                     osd.8
   up  1.00000 1.00000
  7   hdd  3.63689                     osd.7
   up  1.00000 1.00000
 11   hdd  3.53830                     osd.11
   up  1.00000 1.00000
-11       14.44896                 host host2.r1.dc1.lab1
  4   hdd  3.63689                     osd.4
   up  1.00000 1.00000
  9   hdd  3.63689                     osd.9
   up  1.00000 1.00000
  5   hdd  3.63689                     osd.5
   up  1.00000 1.00000
 10   hdd  3.53830                     osd.10
   up  1.00000 1.00000
-13       14.48656                 host host3.r1.dc1.lab1
  0   hdd  3.57590                     osd.0
   up  1.00000 1.00000
  1   hdd  3.63689                     osd.1
   up  1.00000 1.00000
  2   hdd  3.63689                     osd.2
   up  1.00000 1.00000
  3   hdd  3.63689                     osd.3
   up  1.00000 1.00000

My expectation was that each host will contain 2 shards of any PG of the pool.

When I dumped PGs, it was true, but one group is placed on OSDs 0,2,3
which will cause downtime in case of host3 failure.
root@host1:~/mkw # ceph pg dump|grep "^66\."|awk '{print $17}'
dumped all
[4,5,7,6,1,2]

[8,11,9,3,0,2]  <<< - this one is problematic

[6,7,10,9,2,0]
[2,3,7,6,5,9]
[7,8,10,5,3,1]
[4,5,8,6,0,2]
[7,11,9,4,1,2]
[5,9,0,2,7,11]
[9,5,3,1,7,8]
[8,11,2,0,5,9]
[2,0,8,6,10,9]
[3,2,5,9,7,11]
[6,7,9,5,1,2]
[10,5,1,3,11,8]
[4,5,7,8,2,0]
[7,8,3,2,9,10]

Is there a way to ensure that host failure is not disruptive to the cluster?

During the experiment I used info from this thread:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-October/030227.html

Kind regards,

Maks Kowalik
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux