On Tue, Jun 6, 2017 at 10:12 AM, Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote: > I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD > is on a different host. > The erasure coded pool has 64 PGs and an initial state of HEALTH_OK. > > The goal is to deliberately break as many OSDs as possible up to the number > of coding chunks m in order to > evaluate the read performance when these chunks are missing. Per definition > of Reed-Solomon Coding, any > chunks out of the n=k+m total chunks can be missing. To simulate the loss of > an OSD I’m doing the following: > > ceph osd set noup > ceph osd down <ID> > ceph osd out <ID> > > With the above procedure I should be able to kill up to m = 3 OSDs without > loosing any data. However, when I kill k = 3 randomly selected OSDs, > all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD > on which the requests are blocked is working properly and [in,up] in the > cluster. > > My question: Why is it not possible to kill m = 3 OSDs and still operate the > cluster? Isn’t that equivalent to loosing data which > shouldn’t happen in this particular configuration? Is my cluster setup > properly or am I missing something? Sounds like http://tracker.ceph.com/issues/18749, which, yeah, we need to fix that. By default, with a k+m EC code, it currently insists on at least one chunk more than the minimum k to go active. -Greg > > Thank you for your help! > > I have attached all relevant information about the cluster and status > outputs: > > Erasure coding profile: > > jerasure-per-chunk-alignment=false > k=2 > m=3 > plugin=jerasure > ruleset-failure-domain=host > ruleset-root=default > technique=reed_sol_van > w=8 > > Content of ceph.conf: > > [global] > fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2 > mon_initial_members = ip-172-31-27-142 > mon_host = 172.31.27.142 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > osd_pool_default_min_size = 2 > osd_pool_default_size = 2 > mon_allow_pool_delete = true > > Crush rule: > > rule ecpool { > ruleset 1 > type erasure > min_size 2 > max_size 5 > step set_chooseleaf_tries 5 > step set_choose_tries 100 > step take default > step chooseleaf indep 0 type host > step emit > } > > Output of 'ceph -s‘ while cluster is degraded: > > cluster 6353b831-22c3-424c-a8f1-495788e6b4e2 > health HEALTH_ERR > 38 pgs are stuck inactive for more than 300 seconds > 26 pgs degraded > 38 pgs incomplete > 26 pgs stuck degraded > 38 pgs stuck inactive > 64 pgs stuck unclean > 26 pgs stuck undersized > 26 pgs undersized > 2 requests are blocked > 32 sec > recovery 3/5 objects degraded (60.000%) > recovery 1/5 objects misplaced (20.000%) > noup flag(s) set > monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0} > election epoch 6, quorum 0 ip-172-31-27-142 > mgr no daemons active > osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs > flags noup,sortbitwise,require_jewel_osds,require_kraken_osds > pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects > 79668 kB used, 22428 MB / 22505 MB avail > 3/5 objects degraded (60.000%) > 1/5 objects misplaced (20.000%) > 38 incomplete > 15 active+undersized+degraded > 11 active+undersized+degraded+remapped > > Output of 'ceph health‘ while cluster is degraded: > > HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs > degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive; > 64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests > are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5 > objects misplaced (20.000%); noup flag(s) set > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com