Hello, What is your current setup, 1 server pet data center with 12 osd each? What is your current crush rule and LRC crush rule? On Fri, Apr 28, 2023, 12:29 Michel Jouvin <michel.jouvin@xxxxxxxxxxxxxxx> wrote: > Hi, > > I think I found a possible cause of my PG down but still understand why. > As explained in a previous mail, I setup a 15-chunk/OSD EC pool (k=9, > m=6) but I have only 12 OSD servers in the cluster. To workaround the > problem I defined the failure domain as 'osd' with the reasoning that as > I was using the LRC plugin, I had the warranty that I could loose a site > without impact, thus the possibility to loose 1 OSD server. Am I wrong? > > Best regards, > > Michel > > Le 24/04/2023 à 13:24, Michel Jouvin a écrit : > > Hi, > > > > I'm still interesting by getting feedback from those using the LRC > > plugin about the right way to configure it... Last week I upgraded > > from Pacific to Quincy (17.2.6) with cephadm which is doing the > > upgrade host by host, checking if an OSD is ok to stop before actually > > upgrading it. I had the surprise to see 1 or 2 PGs down at some points > > in the upgrade (happened not for all OSDs but for every > > site/datacenter). Looking at the details with "ceph health detail", I > > saw that for these PGs there was 3 OSDs down but I was expecting the > > pool to be resilient to 6 OSDs down (5 for R/W access) so I'm > > wondering if there is something wrong in our pool configuration (k=9, > > m=6, l=5). > > > > Cheers, > > > > Michel > > > > Le 06/04/2023 à 08:51, Michel Jouvin a écrit : > >> Hi, > >> > >> Is somebody using LRC plugin ? > >> > >> I came to the conclusion that LRC k=9, m=3, l=4 is not the same as > >> jerasure k=9, m=6 in terms of protection against failures and that I > >> should use k=9, m=6, l=5 to get a level of resilience >= jerasure > >> k=9, m=6. The example in the documentation (k=4, m=2, l=3) suggests > >> that this LRC configuration gives something better than jerasure k=4, > >> m=2 as it is resilient to 3 drive failures (but not 4 if I understood > >> properly). So how many drives can fail in the k=9, m=6, l=5 > >> configuration first without loosing RW access and second without > >> loosing data? > >> > >> Another thing that I don't quite understand is that a pool created > >> with this configuration (and failure domain=osd, locality=datacenter) > >> has a min_size=3 (max_size=18 as expected). It seems wrong to me, I'd > >> expected something ~10 (depending on answer to the previous question)... > >> > >> Thanks in advance if somebody could provide some sort of > >> authoritative answer on these 2 questions. Best regards, > >> > >> Michel > >> > >> Le 04/04/2023 à 15:53, Michel Jouvin a écrit : > >>> Answering to myself, I found the reason for 2147483647: it's > >>> documented as a failure to find enough OSD (missing OSDs). And it is > >>> normal as I selected different hosts for the 15 OSDs but I have only > >>> 12 hosts! > >>> > >>> I'm still interested by an "expert" to confirm that LRC k=9, m=3, > >>> l=4 configuration is equivalent, in terms of redundancy, to a > >>> jerasure configuration with k=9, m=6. > >>> > >>> Michel > >>> > >>> Le 04/04/2023 à 15:26, Michel Jouvin a écrit : > >>>> Hi, > >>>> > >>>> As discussed in another thread (Crushmap rule for multi-datacenter > >>>> erasure coding), I'm trying to create an EC pool spanning 3 > >>>> datacenters (datacenters are present in the crushmap), with the > >>>> objective to be resilient to 1 DC down, at least keeping the > >>>> readonly access to the pool and if possible the read-write access, > >>>> and have a storage efficiency better than 3 replica (let say a > >>>> storage overhead <= 2). > >>>> > >>>> In the discussion, somebody mentioned LRC plugin as a possible > >>>> jerasure alternative to implement this without tweaking the > >>>> crushmap rule to implement the 2-step OSD allocation. I looked at > >>>> the documentation > >>>> (https://docs.ceph.com/en/latest/rados/operations/erasure-code-lrc/) > >>>> but I have some questions if someone has experience/expertise with > >>>> this LRC plugin. > >>>> > >>>> I tried to create a rule for using 5 OSDs per datacenter (15 in > >>>> total), with 3 (9 in total) being data chunks and others being > >>>> coding chunks. For this, based of my understanding of examples, I > >>>> used k=9, m=3, l=4. Is it right? Is this configuration equivalent, > >>>> in terms of redundancy, to a jerasure configuration with k=9, m=6? > >>>> > >>>> The resulting rule, which looks correct to me, is: > >>>> > >>>> -------- > >>>> > >>>> { > >>>> "rule_id": 6, > >>>> "rule_name": "test_lrc_2", > >>>> "ruleset": 6, > >>>> "type": 3, > >>>> "min_size": 3, > >>>> "max_size": 15, > >>>> "steps": [ > >>>> { > >>>> "op": "set_chooseleaf_tries", > >>>> "num": 5 > >>>> }, > >>>> { > >>>> "op": "set_choose_tries", > >>>> "num": 100 > >>>> }, > >>>> { > >>>> "op": "take", > >>>> "item": -4, > >>>> "item_name": "default~hdd" > >>>> }, > >>>> { > >>>> "op": "choose_indep", > >>>> "num": 3, > >>>> "type": "datacenter" > >>>> }, > >>>> { > >>>> "op": "chooseleaf_indep", > >>>> "num": 5, > >>>> "type": "host" > >>>> }, > >>>> { > >>>> "op": "emit" > >>>> } > >>>> ] > >>>> } > >>>> > >>>> ------------ > >>>> > >>>> Unfortunately, it doesn't work as expected: a pool created with > >>>> this rule ends up with its pages active+undersize, which is > >>>> unexpected for me. Looking at 'ceph health detail` output, I see > >>>> for each page something like: > >>>> > >>>> pg 52.14 is stuck undersized for 27m, current state > >>>> active+undersized, last acting > >>>> > [90,113,2147483647,103,64,147,164,177,2147483647,133,58,28,8,32,2147483647] > >>>> > >>>> For each PG, there is 3 '2147483647' entries and I guess it is the > >>>> reason of the problem. What are these entries about? Clearly it is > >>>> not OSD entries... Looks like a negative number, -1, which in terms > >>>> of crushmap ID is the crushmap root (named "default" in our > >>>> configuration). Any trivial mistake I would have made? > >>>> > >>>> Thanks in advance for any help or for sharing any successful > >>>> configuration? > >>>> > >>>> Best regards, > >>>> > >>>> Michel > >>>> _______________________________________________ > >>>> ceph-users mailing list -- ceph-users@xxxxxxx > >>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx