Hi Frank, We're having a lot of small objects in the cluster... RocksDb has issues with the compaction causing high disk load... That's why we are performing manual compaction... See https://github.com/ceph/ceph/pull/37496 Br, Kristof Op ma 26 okt. 2020 om 12:14 schreef Frank Schilder <frans@xxxxxx>: > Hi Kristof, > > I missed that: why do you need to do manual compaction? > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Kristof Coucke <kristof.coucke@xxxxxxxxx> > Sent: 26 October 2020 11:33:52 > To: Frank Schilder; a.jazdzewski@xxxxxxxxxxxxxx > Cc: ceph-users@xxxxxxx > Subject: Re: Question about expansion existing Ceph cluster - > adding OSDs > > Hi Ansgar, Frank, all, > > Thanks for the feedback in the first place. > > In the meantime, I've added all the disks and the cluster is rebalancing > itself... Which will take ages as you've mentioned. Last week after this > conversation it was around 50% (little bit more), today it's around 44,5%. > Every day, I have to take the cluster down to run manual compaction on > some disks :-(, but that's a known bug where Igor is working on. (Kudos to > him when I get my sleep back at night for this one...) > > Though, I'm still having an issue which I don't completely understand. > When I look into the Ceph dashboard - OSDs, I can see the #pgs for a > specific OSD. Does someone know how this is calculated? Because it seems > incorrect... > E.g. A specific disk shows in the dashboard 189 PGs...? However, examining > the pg dump output I can see that for that particular disk there are 145 > PGs where the disk is in the "up" list, and 168 disks where that particular > disk is in the "acting" list... Of those 2 lists, 135 are in common, > meaning 10 PGs will need to be moved to that disk, while 33 PGs will need > to be moved away... > I can't figure out how the dashboard is getting to the figure of 189... > It's also on other disks (a delta between the PG dump output and the info > in the Ceph dashboard). > > Another example is one disk which I've put on weight 0 as it's marked to > have a predictable failure in the future... So the list with "up" is 0 > (which is correct), and the PGs where this disk is in acting is 49. So, > this seems correct as these 49 PGs need to be moved away. However... > Looking into the Ceph dashboard the UI is saying that there are 71 PGs on > that disk... > > So: > - How does the Ceph dashboard get that number in the 1st place? > - Is there a possibility that there are "orphaned" PG-parts left behind on > a particular OSD? > - If it is possible that there are orphaned parts of a PG left behind on a > disk, how do I clean this up? > > I've also tried examining the osdmap, however, the output seems to be > limited(??). I only see the PGs voor pool 1 and 2. (I don't know if the > file is concatenated by exporting the osd map, or by the osdmaptool > --print). > > The cluster is running Nautilus v14.2.11, all on the same version. > > I'll make some time writing documentation and documenting my findings > which I've all faced in the journey of the last 2 weeks.... Kristof in > Ceph's wunderland... > > Thanks for all your input so far! > > Regards, > > Kristof > > > > Op wo 21 okt. 2020 om 14:01 schreef Frank Schilder <frans@xxxxxx<mailto: > frans@xxxxxx>>: > There have been threads on exactly this. Might depend a bit on your ceph > version. We are running mimic and have no issues doing: > > - set noout, norebalance, nobackfill > - add all OSDs (with weight 1) > - wait for peering to complete > - unset all flags and let the rebalance loose > > Starting with nautilus there seem to be issues with this procedure. Mainly > the peering phase can cause a collapse of the cluster. In your case, it > sounds like you added the OSDs already. You should be able to do relatively > safely: > > - set noout, norebalance, nobackfill > - set weight of OSDs to 1 one by one and wait for peering to complete > every time > - unset all flags and let the rebalance loose > > I believe once the peering succeeded without crashes, the rebalancing will > just work fine. You can easily control how much rebalancing is going on. > > I noted that ceph seems to have a strange concept of priority though. I > needed to gain capacity by adding OSDs and ceph was very consequent with > moving PGs from the fullest OSDs last. The opposite of what should happen. > Thus, it took ages for additional capacity to become available and also the > backfill too full warnings stayed for all the time. You can influence this > to some degree by using force_recovery commands on PGs on the fullest OSDs. > > Best regards and good luck, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Kristof Coucke <kristof.coucke@xxxxxxxxx<mailto: > kristof.coucke@xxxxxxxxx>> > Sent: 21 October 2020 13:29:00 > To: ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > Subject: Question about expansion existing Ceph cluster - > adding OSDs > > Hi, > > I have a cluster with 182 OSDs, this has been expanded towards 282 OSDs. > Some disks were near full. > The new disks have been added with initial weight = 0. > The original plan was to increase this slowly towards their full weight > using the gentle reweight script. However, this is going way too slow and > I'm also having issues now with "backfill_toofull". > Can I just add all the OSDs with their full weight, or will I get a lot of > issues when I'm doing that? > I know that a lot of PGs will have to be replaced, but increasing the > weight slowly will take a year at the current speed. I'm already playing > with the max backfill to increase the speed, but every time I increase the > weight it will take a lot of time again... > I can face the fact that there will be a performance decrease. > > Looking forward to your comments! > > Regards, > > Kristof > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx> > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx