Requests blocked in degraded erasure coded pool

Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> · Tue, 6 Jun 2017 19:12:18 +0200

I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD is on a different host.The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.
The goal is to deliberately break as many OSDs as possible up to the number of coding chunks m in order to evaluate the read performance when these chunks are missing. Per definition of Reed-Solomon Coding, any
chunks out of the n=k+m total chunks can be missing. To simulate the loss of an OSD I’m doing the following:

ceph osd set noup
ceph osd down <ID>
ceph osd out <ID>

With the above procedure I should be able to kill up to m = 3 OSDs without loosing any data. However, when I kill k = 3 randomly selected OSDs, 
all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD on which the requests are blocked is working properly and [in,up] in the cluster.

My question: Why is it not possible to kill m = 3 OSDs and still operate the cluster? Isn’t that equivalent to loosing data which
shouldn’t happen in this particular configuration? Is my cluster setup properly or am I missing something?

Thank you for your help!

I have attached all relevant information about the cluster and status outputs:

Erasure coding profile:

jerasure-per-chunk-alignment=false
k=2
m=3
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8

Content of ceph.conf:

[global]
fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
mon_initial_members = ip-172-31-27-142
mon_host = 172.31.27.142
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_min_size = 2
osd_pool_default_size = 2
mon_allow_pool_delete = true

Crush rule:

rule ecpool {
	ruleset 1
	type erasure
	min_size 2
	max_size 5
	step set_chooseleaf_tries 5
	step set_choose_tries 100
	step take default
	step chooseleaf indep 0 type host
	step emit
}

Output of 'ceph -s‘ while cluster is degraded:

    cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
     health HEALTH_ERR
            38 pgs are stuck inactive for more than 300 seconds
            26 pgs degraded
            38 pgs incomplete
            26 pgs stuck degraded
            38 pgs stuck inactive
            64 pgs stuck unclean
            26 pgs stuck undersized
            26 pgs undersized
            2 requests are blocked > 32 sec
            recovery 3/5 objects degraded (60.000%)
            recovery 1/5 objects misplaced (20.000%)
            noup flag(s) set
     monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
            election epoch 6, quorum 0 ip-172-31-27-142
        mgr no daemons active
     osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
            flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
      pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
            79668 kB used, 22428 MB / 22505 MB avail
            3/5 objects degraded (60.000%)
            1/5 objects misplaced (20.000%)
                  38 incomplete
                  15 active+undersized+degraded
                  11 active+undersized+degraded+remapped

Output of 'ceph health‘ while cluster is degraded:

HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive; 64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5 objects misplaced (20.000%); noup flag(s) set
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com