Re: Locally repairable code description revisited (was Pyramid ...)

Samuel Just <sam.just@xxxxxxxxxxx> · Mon, 9 Jun 2014 13:38:15 -0700



I'm finding that I don't really understand how the LRC specification
works.  Is there a doc somewhere I can read?
-Sam

On Mon, Jun 9, 2014 at 1:18 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
> On Fri, Jun 6, 2014 at 7:30 AM, Loic Dachary <loic@xxxxxxxxxxx> wrote:
>> Hi Andreas,
>>
>> On 06/06/2014 13:46, Andreas Joachim Peters wrote:> Hi Loic,
>>> the basic implementation looks very clean.
>>>
>>> I have few comments/ideas:
>>>
>>> - the reconstruction strategy using the three levels is certainly efficient enough for standard cases but does not guarantee always the minimum decoding (in cases where one layer is not enough to reconstruct) since your third algorithm is just brute-force to reconstruct everything through all layers until we have what we need ...
>>
>> The third strategy is indeed brute force. Do you think it is worth changing to be minimal ? It would be nice to quantify the percent of cases it addresses. Do you know how to do that ? It looks like a very small percentage but there is no proof it is small ;-)
>>
>>> - the whole LRC configuration actually does not describe the placement - it still looks disconnected from the placement strategy/crush rules ... wouldn't it make sense to have the crush rule implicit in the description or a function to derive it automatically based on the LRC configuration? Maybe you have this already done in another way and I didn't see it ...
>>
>> Good catch.
>>
>> What about this:
>>
>>       "  [ \"_aAAA_aAA_\", \"set choose datacenter 2\","
>>       "    \"_aXXX_aXX_\" ],"
>>       "  [ \"b_BBB_____\", \"set choose host 5\","
>>       "    \"baXXX_____\" ],"
>>       "  [ \"_____cCCC_\", \"\","
>>       "    \"baXXXcaXX_\" ],"
>>       "  [ \"_____DDDDd\", \"\","
>>       "    \"baXXXcaXXd\" ],"
>>
>> Which translates into
>>
>> take root
>> set choose datacenter 2
>> set choose host 5
>>
>> In other words, the ruleset is created by concatenating the strings from the description, without any kind of smart computation. It is up to the person who creates the description to add the ruleset near a description that makes sense. There is going to be minimal checking to make sure the ruleset can actually be used to get the required number of chunks.
>>
>> It probably is very difficult and very confusing to automate the generation of the ruleset. If it is implicit rather than explicit as above, the operator will have to somehow understand and learn how it is computed to make sure it does what is desired. With an explicit set of crush rules loosely coupled to chunk mapping, the operator can read the crush documentation instead of guessing.
>
> I think I'm missing some context for this discussion (maybe I haven't
> been reading other threads closely enough); can you discuss this in
> more detail?
> Matching up CRUSH rulesets and the EC plugin formulas is very
> important and demonstrated to be difficult, but I don't really
> understand what you're suggesting here, which makes me think it's not
> quite the right idea. ;)
>
>>
>>> -  should the plug-in have the ability to select reconstruction on proximity or this should be up-to the higher layer to provide chunks in a way that reconstruction would select the 'closest' layer? The relevance of the question you will understand better in the next point ....
>>>
>>> - I remember we had this 3 data centre example with (8,4) where you can reconstruct every object if 2 data centres are up. Another appealing example avoiding remote access when reading an object is that you have 2 data centres having a replication of e.g. (4,2) encoded objects. Can you describe in your LRC configuration language to store the same chunk twice like    __ABCCBA__ ?
>>
>> Unless I'm mistaken that would require the caller of the plugin to support duplicate data chunks and provide a kind of proximity check. Since this is not currently supported by the OSD logic, it is difficult to figure out how an erasure code plugin could provide support for this use case.
>
> I haven't looked at the EC plugin interface at all, but I thought the
> OSD told the plugin what chunks it could access, and the plugin tells
> it which ones to fetch. So couldn't the plugin simply output duplicate
> chunks, and not have the OSD retrieve both of them?
> -Greg
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html