Re: Requests blocked in degraded erasure coded pool

Gregory Farnum <gfarnum@xxxxxxxxxx> · Wed, 07 Jun 2017 20:02:19 +0000

On Wed, Jun 7, 2017 at 12:59 PM Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote:
If you set min_size 2 before taking the OSDs down, that does seem odd.

I think I don’t get the exact concept of min_size in the crush Crush ruleset. The documentation (http://docs.ceph.com/docs/master/rados/operations/crush-map/) states:

min_size
Description:	If a pool makes fewer replicas than this number, CRUSH will NOT select this rule.
Type:		Integer
Purpose:		A component of the rule mask.
Required:	Yes
Default:		1
Assuming that I want my scenario to work (5 OSDs, 2+3 EC Pool, 3 OSDs down, still reading my data), how do
I have to configure my pool exactly to work? Or is this simply not possible at this point? 

The CRUSH rule min_size is a completely different thing from the pool min_size. If you set the pool min_size to 2 I *think* it will do what you expect.

But in general running with min_size == k is not a wise way to run the cluster as you don't have any redundancy in the case of losses. :)

I just want to be sure that I have no errors in my configuration.

Yeah, we just don't have a way of serving reads without serving writes at the moment. It's a limit of the architecture.

Thank you, this is good to know, particularly because I didn’t find anything about it on the documentation.

- Jonas

Am 07.06.2017 um 21:40 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:

On Wed, Jun 7, 2017 at 12:30 PM Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote:

Am 07.06.2017 um 20:29 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:

We prevent PGs from going active (and serving writes or reads) when they have less than "min_size" OSDs participating. This is generally set so that we have enough redundancy to recover from at least one OSD failing.

Do you mean the min_size value from the crush rule? I set min_size = 2, so a 2+3 EC pool with 3 killed OSDs still has the minimum amount of 2 OSDs and should be ableto fully recover data, right?

If you set min_size 2 before taking the OSDs down, that does seem odd.

In your case, you have 2 OSDs and the failure of either one of them results in the loss of all written data. So we don't let you go active as it's not safe.

I get that it makes no sense to serve writes at this point because we cannot provide the desired redundancy, but how is preventing me from going active more safe than just serving reads? I think what bugs me is that by definition of the used erasure code, we should be able to loose 3 OSDs and still get our data back - which is not the case in this scenario because our cluster refuses to go active.

Yeah, we just don't have a way of serving reads without serving writes at the moment. It's a limit of the architecture.

-Greg
PS: please keep this on the list. It spreads the information and archives it for future reference by others. :)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com