Hi, With 5 hosts, I could successfully create pools with k=4 and m=1, with the failure domain being set to “host”. With 6 hosts, I could also create k=4,m=1 EC pools. But I suddenly failed with 6 hosts k=5 and m=1, or k=4,m=2 : the PGs were never created – I reused the pool name for my tests, this seems to matter, see below- ?? HEALTH_WARN 512 pgs stuck inactive; 512 pgs stuck unclean pg 159.70 is stuck inactive since forever, current state creating, last acting [] pg 159.71 is stuck inactive since forever, current state creating, last acting [] pg 159.72 is stuck inactive since forever, current state creating, last acting [] The pool is like this : [root@ceph0 ~]# ceph osd pool get testec erasure_code_profile erasure_code_profile: erasurep4_2_host [root@ceph0 ~]# ceph osd erasure-code-profile get erasurep4_2_host directory=/usr/lib64/ceph/erasure-code k=4 m=2 plugin=isa ruleset-failure-domain=host The PG list is like this – all PGs are alike- : pg_stat objects mip degr misp unf bytes log disklog state state_stamp v reported up up_primary
acting acting_primary last_scrub scrub_stamp last_deep_scrub deep_scrub_stamp 159.0 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 []
-1 0'0 2015-09-30 14:41:01.219196 0'0 2015-09-30 14:41:01.219196 159.1 0 0 0 0 0 0 0 0 creating 0.000000 0'0 0:0 [] -1 []
-1 0'0 2015-09-30 14:41:01.219197 0'0 2015-09-30 14:41:01.219197 I can’t dump a PG (but if it’s on no OSD then…) [root@ceph0 ~]# ceph pg 159.0 dump ^CError EINTR: problem getting command descriptions from pg.159.0 Ø
Hangs. The OSD tree is like this : -1 21.71997 root default -2 3.62000 host ceph4 9 1.81000 osd.9 up 1.00000 1.00000 15 1.81000 osd.15 up 1.00000 1.00000 -3 3.62000 host ceph0 5 1.81000 osd.5 up 1.00000 1.00000 11 1.81000 osd.11 up 1.00000 1.00000 -4 3.62000 host ceph1 6 1.81000 osd.6 up 1.00000 1.00000 12 1.81000 osd.12 up 1.00000 1.00000 -5 3.62000 host ceph2 7 1.81000 osd.7 up 1.00000 1.00000 13 1.81000 osd.13 up 1.00000 1.00000 -6 3.62000 host ceph3 8 1.81000 osd.8 up 1.00000 1.00000 14 1.81000 osd.14 up 1.00000 1.00000 -13 3.62000 host ceph5 10 1.81000 osd.10 up 1.00000 1.00000 16 1.81000 osd.16 up 1.00000 1.00000 Then, I dumped the crush ruleset and noticed the “max_size=5”. [root@ceph0 ~]# ceph osd pool get testec crush_ruleset crush_ruleset: 1
[root@ceph0 ~]# ceph osd crush rule dump testec { "rule_id": 1, "rule_name": "testec", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 5, I thought I should not care, since I’m not creating a replicated pool but… I then deleted the pool + deleted the “testec” ruleset, re-created the pool and… boom, PGs started being created !? Now, the ruleset looks like this : [root@ceph0 ~]# ceph osd crush rule dump testec { "rule_id": 1, "rule_name": "testec", "ruleset": 1, "type": 3, "min_size": 3, "max_size": 6, ^^^ Is this a bug, or a “feature” (if so, I’d be glad if someone could shed some light on it ?) ? I’m presuming ceph is considering that an EC chunk is a replica, but I’m failing to understand the documentation : I did not select the crush ruleset when I created the pool. Still, the ruleset was chosen by default (by CRUSH?) , and was not working… ? Thanks && regards |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com