Help needed to configure erasure coding LRC plugin

Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx> · Tue, 4 Apr 2023 15:26:14 +0200

Hi,

As discussed in another thread (Crushmap rule for multi-datacenter 
erasure coding), I'm trying to create an EC pool spanning 3 datacenters 
(datacenters are present in the crushmap), with the objective to be 
resilient to 1 DC down, at least keeping the readonly access to the pool 
and if possible the read-write access, and have a storage efficiency 
better than 3 replica (let say a storage overhead <= 2).

In the discussion, somebody mentioned LRC plugin as a possible jerasure 
alternative to implement this without tweaking the crushmap rule to 
implement the 2-step OSD allocation. I looked at the documentation 
(https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/) but 
I have some questions if someone has experience/expertise with this LRC 
plugin.

I tried to create a rule for using 5 OSDs per datacenter (15 in total), 
with 3 (9 in total) being data chunks and others being coding chunks. 
For this, based of my understanding of examples, I used k=9, m=3, l=4. 
Is it right? Is this configuration equivalent, in terms of redundancy, 
to a jerasure configuration with k=9, m=6?

The resulting rule, which looks correct to me, is:

--------

{
    "rule_id": 6,
    "rule_name": "test_lrc_2",
    "ruleset": 6,
    "type": 3,
    "min_size": 3,
    "max_size": 15,
    "steps": [
        {
            "op": "set_chooseleaf_tries",
            "num": 5
        },
        {
            "op": "set_choose_tries",
            "num": 100
        },
        {
            "op": "take",
            "item": -4,
            "item_name": "default~hdd"
        },
        {
            "op": "choose_indep",
            "num": 3,
            "type": "datacenter"
        },
        {
            "op": "chooseleaf_indep",
            "num": 5,
            "type": "host"
        },
        {
            "op": "emit"
        }
    ]
}

------------

Unfortunately, it doesn't work as expected: a pool created with this 
rule ends up with its pages active+undersize, which is unexpected for 
me. Looking at 'ceph health detail` output, I see for each page 
something like:

pg 52.14 is stuck undersized for 27m, current state active+undersized, 
last acting 
[90,113,2147483647,103,64,147,164,177,2147483647,133,58,28,8,32,2147483647]

For each PG, there is 3 '2147483647' entries and I guess it is the 
reason of the problem. What are these entries about? Clearly it is not 
OSD entries... Looks like a negative number, -1, which in terms of 
crushmap ID is the crushmap root (named "default" in our configuration). 
Any trivial mistake I would have made?

Thanks in advance for any help or for sharing any successful configuration?

Best regards,

Michel
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx