I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD is on a different host. The erasure coded pool has 64 PGs and an initial state of HEALTH_OK. The goal is to deliberately break as many OSDs as possible up to the number of coding chunks m in order to evaluate the read performance when these chunks are missing. Per definition of Reed-Solomon Coding, any chunks out of the n=k+m total chunks can be missing. To simulate the loss of an OSD I’m doing the following: ceph osd set noup ceph osd down <ID> ceph osd out <ID> With the above procedure I should be able to kill up to m = 3 OSDs without loosing any data. However, when I kill k = 3 randomly selected OSDs, all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD on which the requests are blocked is working properly and [in,up] in the cluster. My question: Why is it not possible to kill m = 3 OSDs and still operate the cluster? Isn’t that equivalent to loosing data which shouldn’t happen in this particular configuration? Is my cluster setup properly or am I missing something? Thank you for your help! I have attached all relevant information about the cluster and status outputs: Erasure coding profile: jerasure-per-chunk-alignment=false k=2 m=3 plugin=jerasure ruleset-failure-domain=host ruleset-root=default technique=reed_sol_van w=8 Content of ceph.conf: [global] fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2 mon_initial_members = ip-172-31-27-142 mon_host = 172.31.27.142 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx osd_pool_default_min_size = 2 osd_pool_default_size = 2 mon_allow_pool_delete = true Crush rule: rule ecpool { ruleset 1 type erasure min_size 2 max_size 5 step set_chooseleaf_tries 5 step set_choose_tries 100 step take default step chooseleaf indep 0 type host step emit } Output of 'ceph -s‘ while cluster is degraded: cluster 6353b831-22c3-424c-a8f1-495788e6b4e2 health HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds 26 pgs degraded 38 pgs incomplete 26 pgs stuck degraded 38 pgs stuck inactive 64 pgs stuck unclean 26 pgs stuck undersized 26 pgs undersized 2 requests are blocked > 32 sec recovery 3/5 objects degraded (60.000%) recovery 1/5 objects misplaced (20.000%) noup flag(s) set monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0} election epoch 6, quorum 0 ip-172-31-27-142 mgr no daemons active osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs flags noup,sortbitwise,require_jewel_osds,require_kraken_osds pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects 79668 kB used, 22428 MB / 22505 MB avail 3/5 objects degraded (60.000%) 1/5 objects misplaced (20.000%) 38 incomplete 15 active+undersized+degraded 11 active+undersized+degraded+remapped Output of 'ceph health‘ while cluster is degraded: HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive; 64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5 objects misplaced (20.000%); noup flag(s) set |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com