Re: Disabling CRUSH for erasure code and doing custom placement

Gregory Farnum <greg@xxxxxxxxxxx> · Tue, 24 Jun 2014 09:17:44 -0700

On Tue, Jun 24, 2014 at 9:12 AM, Shayan Saeed <shayansaeed93@xxxxxxxxx> wrote:
> I assumed that creating a large number of pools might not be scalable.
> If there is no overhead in creating as many pools as I want within an
> OSD, I would probably choose this option.

There is an overhead per-PG, and pools create PGs, but OSDs expect to
hold hundreds, and can generally handle several thousands.

> I just want to specify that
> systematic chunks should be among 'a' racks while distribute others
> among 'b' racks. The only problem is that I want to do this for every
> incoming file (the k and m for erasure coded files can vary too) to
> the cluster and while there are around 10 racks, the various
> combinations might grow to be quite large which would make CRUSH map
> file huge.

Well, you specify the EC rules to use on a per-pool basis. You
*really* aren't going to be able to change this so that a pool
contains objects of different encoding schemes; the encoding is
inherent in how many OSDs are members of the PG, etc.
However, it's quite simple to specify a group of OSDs which are used
for the data chunks, and a separate group of OSDs used for the parity
chunks. Just set up separate CRUSH map roots for each, and then do
multiple take...emit steps within the rule.

> Would this affect my performance if the number of pools,
> CRUSH rules grows abnormally large?
>
> I might go for this option if there is no prohibitive trade off and/or
> changing the source code for this proves really challenging.

The source changes you're talking about will prove really challenging. ;)
-Greg

>
> Regards,
> Shayan Saeed
>
>
> On Tue, Jun 24, 2014 at 11:37 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>> On Tue, Jun 24, 2014 at 8:29 AM, Shayan Saeed <shayansaeed93@xxxxxxxxx> wrote:
>>> Hi,
>>>
>>> CRUSH placement algorithm works really nice with replication. However,
>>> with erasure code, my cluster has some issues which require making
>>> changes that I cannot specify with CRUSH maps.
>>> Sometimes, depending on the type of data, I would like to place them
>>> on different OSDs but in the same pool.
>>
>> Why do you want to keep the data in the same pool?
>>
>>>
>>> I realize that to disable the CRUSH placement algorithm and replacing
>>> it with my own custom algorithm, such as random placement algo or any
>>> other, I have to make changes in the source code. I want to ask if
>>> there is an easy way to do this without going into every code file and
>>> looking where the mapping from objects to PG is done and changing
>>> that. Is there some configuration option which disables crush and
>>> points to my own placement algo file for doing custom placement.
>>
>> What you're asking for really doesn't sound feasible, but the thing
>> that comes closest would probably be resurrecting the "pg preferred"
>> mechanisms in CRUSH and the Ceph codebase. You'll have to go back
>> through the git history to find it, but once upon a time we supported
>> a mechanism that let you specify a specific OSD you wanted a
>> particular object to live on, and then it would place the remaining
>> replicas using CRUSH.
>> -Greg
>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>
>>>
>>> Let me know about the most neat way to go about it. Appreciate any
>>> help I can get.
>>>
>>> Regards,
>>> Shayan Saeed
>>> Research Assistant, Systems Research Lab
>>> University of Illinois Urbana-Champaign
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html