Ah I see, should have look at the “raw” data instead ;-) Then I agree this very weird? Best, Jesper -------------------------- Jesper Lykkegaard Karlsen Scientific Computing Centre for Structural Biology Department of Molecular Biology and Genetics Aarhus University Universitetsbyen 81 8000 Aarhus C E-mail: jelka@xxxxxxxxx Tlf: +45 50906203 > On 28 Jul 2022, at 12.45, Frank Schilder <frans@xxxxxx> wrote: > > Hi Jesper, > > thanks for looking at this. The failure domain is OSD and not host. I typed it wrong in the text, the copy of the crush rule shows it right: step choose indep 0 type osd. > > I'm trying to reproduce the observation to file a tracker item, but it is more difficult than expected. It might be a race condition, so far I didn't see it again. I hope I can figure out when and why this is happening. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Jesper Lykkegaard Karlsen <jelka@xxxxxxxxx> > Sent: 28 July 2022 12:02:51 > To: Frank Schilder > Cc: ceph-users@xxxxxxx > Subject: Re: PG does not become active > > Hi Frank, > > I think you need at least 6 OSD hosts to make EC 4+2 with faillure domain host. > > I do not know how it was possible for you to create that configuration at first? > Could it be that you have multiple name for the OSD hosts? > That would at least explain the one OSD down, being show as two OSDs down. > > Also, I believe that min_size should never be smaller than “coding” shards, which is 4 in this case. > > You can either make a new test setup with your three test OSD hosts using EC 2+1 or make e.g. 4+2, but with failure domain set to OSD. > > Best, > Jesper > > -------------------------- > Jesper Lykkegaard Karlsen > Scientific Computing > Centre for Structural Biology > Department of Molecular Biology and Genetics > Aarhus University > Universitetsbyen 81 > 8000 Aarhus C > > E-mail: jelka@xxxxxxxxx > Tlf: +45 50906203 > >> On 27 Jul 2022, at 17.32, Frank Schilder <frans@xxxxxx> wrote: >> >> Update: the inactive PG got recovered and active after a loooonngg wait. The middle question is now answered. However, these two questions are still of great worry: >> >> - How can 2 OSDs be missing if only 1 OSD is down? >> - If the PG should recover, why is it not prioritised considering its severe degradation >> compared with all other PGs? >> >> I don't understand how a PG can loose 2 shards if 1 OSD goes down. That looks really really bad to me (did ceph loose track of data??). >> >> The second is of no less importance. The inactive PG was holding back client IO, leading to further warnings about slow OPS/requests/... Why are such critically degraded PGs not scheduled for recovery first? There is a service outage but only a health warning? >> >> Thanks and best regards. >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> >> ________________________________________ >> From: Frank Schilder <frans@xxxxxx> >> Sent: 27 July 2022 17:19:05 >> To: ceph-users@xxxxxxx >> Subject: PG does not become active >> >> I'm testing octopus 15.2.16 and run into a problem right away. I'm filling up a small test cluster with 3 hosts 3x3 OSDs and killed one OSD to see how recovery works. I have one 4+2 EC pool with failure domain host and on 1 PGs of this pool 2 (!!!) shards are missing. This most degraded PG is not becoming active, its stuck inactive but peered. >> >> Questions: >> >> - How can 2 OSDs be missing if only 1 OSD is down? >> - Wasn't there an important code change to allow recovery for an EC PG with at >> least k shards present even if min_size>k? Do I have to set something? >> - If the PG should recover, why is it not prioritised considering its severe degradation >> compared with all other PGs? >> >> I have already increased these crush tunables and executed a pg repeer to no avail: >> >> tunable choose_total_tries 250 <-- default 100 >> rule fs-data { >> id 1 >> type erasure >> min_size 3 >> max_size 6 >> step set_chooseleaf_tries 50 <-- default 5 >> step set_choose_tries 200 <-- default 100 >> step take default >> step choose indep 0 type osd >> step emit >> } >> >> Ceph health detail says to that: >> >> [WRN] PG_AVAILABILITY: Reduced data availability: 1 pg inactive >> pg 4.32 is stuck inactive for 37m, current state recovery_wait+undersized+degraded+remapped+peered, last acting [1,2147483647,2147483647,4,5,2] >> >> I don't want to cheat and set min_size=k on this pool. It should work by itself. >> >> Thanks for any pointers! >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >> _______________________________________________ >> ceph-users mailing list -- ceph-users@xxxxxxx >> To unsubscribe send an email to ceph-users-leave@xxxxxxx >
_______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx