Whoops, sent that too early. Let me try again.
On Wed, Jun 7, 2017 at 3:24 AM Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote:
Thank you for your feedback! Do you have more information on why at leastk+1 nodes need to be active in order for the cluster to work at this point?
Actually, I misread your email and misdiagnosed it into being too precise. In your case, you've got a 2+3 EC pool and killed 3 OSDs.
Roughly:
We prevent PGs from going active (and serving writes or reads) when they have less than "min_size" OSDs participating. This is generally set so that we have enough redundancy to recover from at least one OSD failing.
In your case, you have 2 OSDs and the failure of either one of them results in the loss of all written data. So we don't let you go active as it's not safe.
I am particularly interested in any material on the erasure coding implementationsin Ceph and how they work in depth. Sometimes the official documentation doesn’tsupply the needed information on problems beyond the point of a default clustersetup. Are there any technical documentations on the implementation or somethingsimilar?
http://docs.ceph.com/docs/master/dev/osd_internals/erasure_coding/ and the pages it links to
-Greg
Any help is appreciated.Best regards,JonasAm 07.06.2017 um 08:00 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:On Tue, Jun 6, 2017 at 10:12 AM, Jonas Jaszkowic
<jonasjaszkowic@xxxxxxxxxxxxxx> wrote:I setup a simple Ceph cluster with 5 OSD nodes and 1 monitor node. Each OSD
is on a different host.
The erasure coded pool has 64 PGs and an initial state of HEALTH_OK.
The goal is to deliberately break as many OSDs as possible up to the number
of coding chunks m in order to
evaluate the read performance when these chunks are missing. Per definition
of Reed-Solomon Coding, any
chunks out of the n=k+m total chunks can be missing. To simulate the loss of
an OSD I’m doing the following:
ceph osd set noup
ceph osd down <ID>
ceph osd out <ID>
With the above procedure I should be able to kill up to m = 3 OSDs without
loosing any data. However, when I kill k = 3 randomly selected OSDs,
all requests to the cluster are blocked and HEALTH_ERR is showing. The OSD
on which the requests are blocked is working properly and [in,up] in the
cluster.
My question: Why is it not possible to kill m = 3 OSDs and still operate the
cluster? Isn’t that equivalent to loosing data which
shouldn’t happen in this particular configuration? Is my cluster setup
properly or am I missing something?
Sounds like http://tracker.ceph.com/issues/18749, which, yeah, we need
to fix that. By default, with a k+m EC code, it currently insists on
at least one chunk more than the minimum k to go active.
-Greg
Thank you for your help!
I have attached all relevant information about the cluster and status
outputs:
Erasure coding profile:
jerasure-per-chunk-alignment=false
k=2
m=3
plugin=jerasure
ruleset-failure-domain=host
ruleset-root=default
technique=reed_sol_van
w=8
Content of ceph.conf:
[global]
fsid = 6353b831-22c3-424c-a8f1-495788e6b4e2
mon_initial_members = ip-172-31-27-142
mon_host = 172.31.27.142
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
osd_pool_default_min_size = 2
osd_pool_default_size = 2
mon_allow_pool_delete = true
Crush rule:
rule ecpool {
ruleset 1
type erasure
min_size 2
max_size 5
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
Output of 'ceph -s‘ while cluster is degraded:
cluster 6353b831-22c3-424c-a8f1-495788e6b4e2
health HEALTH_ERR
38 pgs are stuck inactive for more than 300 seconds
26 pgs degraded
38 pgs incomplete
26 pgs stuck degraded
38 pgs stuck inactive
64 pgs stuck unclean
26 pgs stuck undersized
26 pgs undersized
2 requests are blocked > 32 sec
recovery 3/5 objects degraded (60.000%)
recovery 1/5 objects misplaced (20.000%)
noup flag(s) set
monmap e2: 1 mons at {ip-172-31-27-142=172.31.27.142:6789/0}
election epoch 6, quorum 0 ip-172-31-27-142
mgr no daemons active
osdmap e194: 5 osds: 2 up, 2 in; 64 remapped pgs
flags noup,sortbitwise,require_jewel_osds,require_kraken_osds
pgmap v970: 64 pgs, 1 pools, 592 bytes data, 1 objects
79668 kB used, 22428 MB / 22505 MB avail
3/5 objects degraded (60.000%)
1/5 objects misplaced (20.000%)
38 incomplete
15 active+undersized+degraded
11 active+undersized+degraded+remapped
Output of 'ceph health‘ while cluster is degraded:
HEALTH_ERR 38 pgs are stuck inactive for more than 300 seconds; 26 pgs
degraded; 38 pgs incomplete; 26 pgs stuck degraded; 38 pgs stuck inactive;
64 pgs stuck unclean; 26 pgs stuck undersized; 26 pgs undersized; 2 requests
are blocked > 32 sec; recovery 3/5 objects degraded (60.000%); recovery 1/5
objects misplaced (20.000%); noup flag(s) set
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com