Re: Disabling CRUSH for erasure code and doing custom placement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Another question along the same lines. For erasure code, same as
replicated files, the request goes through the primary member. Isn't
it possible to send the request to any of the members and get the
file. While this might have kept things neater on the development side
and might have made some sense for replicated system, it makes the
availability and load balancing worse for erasure coded files. I see a
lot of requests coming in for a specific object which makes the
primary osd hosting it go down sometimes and then all the requests
have to wait another osd comes up and repair is done.  For load
balancing purposes, is there a way to make the requests go to someone
else without hinderance and get the object without waiting for repair.

Thanks,
Shayan
Regards,
Shayan Saeed


On Tue, Jul 15, 2014 at 1:18 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> One of Ceph's design tentpoles is *avoiding* a central metadata lookup
> table. The Ceph MDS maintains a filesystem hierarchy but doesn't
> really handle the sort of thing you're talking about, either. If you
> want some kind of lookup, you'll need to build it yourself — although
> you could make use of some RADOS features to do it, if you really
> wanted to. (For instance, depending on scale you could keep an index
> of objects in an omap somewhere.)
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Tue, Jul 15, 2014 at 10:11 AM, Shayan Saeed <shayansaeed93@xxxxxxxxx> wrote:
>> Well I did end up putting the data in different pools for custom
>> placement. However, I run into trouble during retrieval. The messy way
>> is to query every pool to check where the data is stored. This
>> requires many round trips to machines in the far off racks. Is it
>> possible this information is contained within a centralized sort of
>> metadata server? I understand that for simple object store MDS is not
>> used but is there a way to utilize it for faster querying?
>>
>> Regards,
>> Shayan Saeed
>>
>>
>> On Tue, Jun 24, 2014 at 11:37 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
>>> On Tue, Jun 24, 2014 at 8:29 AM, Shayan Saeed <shayansaeed93@xxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> CRUSH placement algorithm works really nice with replication. However,
>>>> with erasure code, my cluster has some issues which require making
>>>> changes that I cannot specify with CRUSH maps.
>>>> Sometimes, depending on the type of data, I would like to place them
>>>> on different OSDs but in the same pool.
>>>
>>> Why do you want to keep the data in the same pool?
>>>
>>>>
>>>> I realize that to disable the CRUSH placement algorithm and replacing
>>>> it with my own custom algorithm, such as random placement algo or any
>>>> other, I have to make changes in the source code. I want to ask if
>>>> there is an easy way to do this without going into every code file and
>>>> looking where the mapping from objects to PG is done and changing
>>>> that. Is there some configuration option which disables crush and
>>>> points to my own placement algo file for doing custom placement.
>>>
>>> What you're asking for really doesn't sound feasible, but the thing
>>> that comes closest would probably be resurrecting the "pg preferred"
>>> mechanisms in CRUSH and the Ceph codebase. You'll have to go back
>>> through the git history to find it, but once upon a time we supported
>>> a mechanism that let you specify a specific OSD you wanted a
>>> particular object to live on, and then it would place the remaining
>>> replicas using CRUSH.
>>> -Greg
>>> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>
>>>>
>>>> Let me know about the most neat way to go about it. Appreciate any
>>>> help I can get.
>>>>
>>>> Regards,
>>>> Shayan Saeed
>>>> Research Assistant, Systems Research Lab
>>>> University of Illinois Urbana-Champaign
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux