Re: Requests blocked in degraded erasure coded pool

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 6 Jun 2017 23:00:22 -0700



On Tue, Jun 6, 2017 at 10:12 AM, Jonas Jaszkowic
<jonasjaszkowic@xxxxxxxxxxxxxx> wrote:
> I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD
> is on a different host.
> The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.
>
> The goal is to deliberately break as many OSDs as possible up to the number
> of coding chunks m in order to
> evaluate the read performance when these chunks are missing. Per definition
> of Reed-Solomon Coding, any
> chunks out of the n=k+m total chunks can be missing. To simulate the loss of
> an OSD I’m doing the following:
>
> ceph osd set noup
> ceph osd down <ID>
> ceph osd out <ID>
>
> With the above procedure I should be able to kill up to m = 3 OSDs without
> loosing any data. However, when I kill k = 3 randomly selected OSDs,
> all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD
> on which the requests are blocked is working properly and [in,up] in the
> cluster.
>
> My question: Why is it not possible to kill m = 3 OSDs and still operate the
> cluster? Isn’t that equivalent to loosing data which
> shouldn’t happen in this particular configuration? Is my cluster setup
> properly or am I missing something?

Sounds like http://tracker.ceph.com/issues/18749, which, yeah, we need
to fix that. By default, with a k+m EC code, it currently insists on
at least one chunk more than the minimum k to go active.
-Greg

>
> Thank you for your help!
>
> I have attached all relevant information about the cluster and status
> outputs:
>
> Erasure coding profile:
>
> jerasure-per-chunk-alignment=false
> k=2
> m=3
> plugin=jerasure
> ruleset-failure-domain=host
> ruleset-root=default
> technique=reed_sol_van
> w=8
>
> Content of ceph.conf:
>
> [global]
> fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
> mon_initial_members = ip-172-31-27-142
> mon_host = 172.31.27.142
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> osd_pool_default_min_size = 2
> osd_pool_default_size = 2
> mon_allow_pool_delete = true
>
> Crush rule:
>
> rule ecpool {
> ruleset 1
> type erasure
> min_size 2
> max_size 5
> step set_chooseleaf_tries 5
> step set_choose_tries 100
> step take default
> step chooseleaf indep 0 type host
> step emit
> }
>
> Output of 'ceph -s‘ while cluster is degraded:
>
>     cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
>      health HEALTH_ERR
>             38 pgs are stuck inactive for more than 300 seconds
>             26 pgs degraded
>             38 pgs incomplete
>             26 pgs stuck degraded
>             38 pgs stuck inactive
>             64 pgs stuck unclean
>             26 pgs stuck undersized
>             26 pgs undersized
>             2 requests are blocked > 32 sec
>             recovery 3/5 objects degraded (60.000%)
>             recovery 1/5 objects misplaced (20.000%)
>             noup flag(s) set
>      monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
>             election epoch 6, quorum 0 ip-172-31-27-142
>         mgr no daemons active
>      osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
>             flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
>       pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
>             79668 kB used, 22428 MB / 22505 MB avail
>             3/5 objects degraded (60.000%)
>             1/5 objects misplaced (20.000%)
>                   38 incomplete
>                   15 active+undersized+degraded
>                   11 active+undersized+degraded+remapped
>
> Output of 'ceph health‘ while cluster is degraded:
>
> HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs
> degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive;
> 64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests
> are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5
> objects misplaced (20.000%); noup flag(s) set
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com