Re: Requests blocked in degraded erasure coded pool

Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> · Wed, 7 Jun 2017 22:11:41 +0200

The CRUSH rule min_size is a completely different thing from the pool min_size. If you set the pool min_size to 2 I *think* it will do what you expect.
If you set min_size 2 before taking the OSDs down, that does seem odd.

Good to know, I got confused by the same names. I will try to set the correct min_size and see what happens. What did you mean when you said that it seems odd
that I set the (correct) min_size = 2 before taking the OSDs down, isn’t that the way I would do it now?

But in general running with min_size == k is not a wise way to run the cluster as you don't have any redundancy in the case of losses. :)

I totally agree. I am trying to understand erasure coding in Ceph in depth and it was kind of strange to have 2 OSDs left but not getting my data back. The way it
seems now it was only a configuration issue.

- Jonas

Am 07.06.2017 um 22:02 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:

On Wed, Jun 7, 2017 at 12:59 PM Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote:
If you set min_size 2 before taking the OSDs down, that does seem odd.

I think I don’t get the exact concept of min_size in the crush Crush ruleset. The documentation (http://docs.ceph.com/docs/master/rados/operations/crush-map/) states:

min_size
Description:	If a pool makes fewer replicas than this number, CRUSH will NOT select this rule.
Type:		Integer
Purpose:		A component of the rule mask.
Required:	Yes
Default:		1
Assuming that I want my scenario to work (5 OSDs, 2+3 EC Pool, 3 OSDs down, still reading my data), how do
I have to configure my pool exactly to work? Or is this simply not possible at this point? 

The CRUSH rule min_size is a completely different thing from the pool min_size. If you set the pool min_size to 2 I *think* it will do what you expect.

But in general running with min_size == k is not a wise way to run the cluster as you don't have any redundancy in the case of losses. :)

I just want to be sure that I have no errors in my configuration.

Yeah, we just don't have a way of serving reads without serving writes at the moment. It's a limit of the architecture.

Thank you, this is good to know, particularly because I didn’t find anything about it on the documentation.

- Jonas

Am 07.06.2017 um 21:40 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:

On Wed, Jun 7, 2017 at 12:30 PM Jonas Jaszkowic <jonasjaszkowic@xxxxxxxxxxxxxx> wrote:

Am 07.06.2017 um 20:29 schrieb Gregory Farnum <gfarnum@xxxxxxxxxx>:

We prevent PGs from going active (and serving writes or reads) when they have less than "min_size" OSDs participating. This is generally set so that we have enough redundancy to recover from at least one OSD failing.

Do you mean the min_size value from the crush rule? I set min_size = 2, so a 2+3 EC pool with 3 killed OSDs still has the minimum amount of 2 OSDs and should be ableto fully recover data, right?

If you set min_size 2 before taking the OSDs down, that does seem odd.

In your case, you have 2 OSDs and the failure of either one of them results in the loss of all written data. So we don't let you go active as it's not safe.

I get that it makes no sense to serve writes at this point because we cannot provide the desired redundancy, but how is preventing me from going active more safe than just serving reads? I think what bugs me is that by definition of the used erasure code, we should be able to loose 3 OSDs and still get our data back - which is not the case in this scenario because our cluster refuses to go active.

Yeah, we just don't have a way of serving reads without serving writes at the moment. It's a limit of the architecture.

-Greg
PS: please keep this on the list. It spreads the information and archives it for future reference by others. :)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com