Frank, Then if you have only a few OSDs with excessive PG counts / usage, do you reweight it down by something like 10-20% to acheive a better distribution and improve capacity? Do weight it back to normal after PGs have moved? I wondered if manually picking on some of the higher data usage OSDs could get to a gold outcome and avoid continous rebalancing or other issues. Thanks, Matt On Mon, Dec 5, 2022 at 4:32 AM Frank Schilder <frans@xxxxxx> wrote: > Hi Matt, > > I can't comment on balancers, I don't use them. I manually re-weight OSDs, > which fits well with our pools' OSD allocation. Also, we don't aim for > perfect balance, we just remove the peak of allocation on the fullest few > OSDs to avoid excessive capacity loss. Not balancing too much has the pro > of being fairly stable under OSD failures/additions at the expanse of a few > % less capacity. > > Maybe someone else an help here? > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Matt Larson <larsonmattr@xxxxxxxxx> > Sent: 04 December 2022 02:00:11 > To: Eneko Lacunza > Cc: Frank Schilder; ceph-users > Subject: Re: Re: What to expect on rejoining a host to > cluster? > > Thank you Frank and Eneko, > > Without help and support from ceph admins like you, I would be adrift. I > really appreciate this. > > I rejoined the host now one week ago, and the cluster has been dealing > with the misplaced objects and recovering well. > > I will use this strategy in the future: > > "If you consider replacing the host and all disks, get a new host first > and give it the host name in the crush map. Just before you deploy the new > host, simply purge all down OSDs in its bucket (set norebalance) and > deploy. Then, the data movement is restricted to re-balancing to the new > host. > > If you just want to throw out the old host, destroy the OSDs but keep the > IDs intact (ceph osd destroy). Then, no further re-balancing will happen > and you can re-use the OSD ids later when adding a new host. That's a > stable situation from an operations point of view." > > Last question I have is that I am now seeing that some OSDs have uneven > load of PGs, which balancer do you recommend and any caveats for how the > balancer operations can affect/slow the cluster? > > Thanks, > Matt > > On Mon, Nov 28, 2022 at 2:23 AM Eneko Lacunza <elacunza@xxxxxxxxx<mailto: > elacunza@xxxxxxxxx>> wrote: > Hi Matt, > > Also, make sure that when rejoining host has correct time. I have seen > clusters going down when rejoining hosts that were down for maintenance for > various weeks and came in with datetime deltas of some months (no idea why > that happened, I arrived with the firefighter team ;-) ) > > Cheers > > El 27/11/22 a las 13:27, Frank Schilder escribió: > > Hi Matt, > > if you didn't touch the OSDs on that host, they will join and only objects > that have been modified will actually be updated. Ceph keeps some basic > history information and can detect changes. 2 weeks is not a very long > time. If you have a lot of cold data, re-integration will go fast. > > Initially, you will see a huge amount of misplaced objects. However, this > count will go down much faster than objects/s recovery. > > Before you rejoin the host, I would fix its issues though. Now that you > have it out of the cluster, do the maintenance first. There is no rush. In > fact, you can buy a new host, install the OSDs in the new one and join that > to the cluster with the host-name of the old host. > > If you consider replacing the host and all disks, the get a new host first > and give it the host name in the crush map. Just before you deploy the new > host, simply purge all down OSDs in its bucket (set norebalance) and > deploy. Then, the data movement is restricted to re-balancing to the new > host. > > If you just want to throw out the old host, destroy the OSDs but keep the > IDs intact (ceph osd destroy). Then, no further re-balancing will happen > and you can re-use the OSD ids later when adding a new host. That's a > stable situation from an operations point of view. > > Hope that helps. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Matt Larson <larsonmattr@xxxxxxxxx><mailto:larsonmattr@xxxxxxxxx> > Sent: 26 November 2022 21:07:41 > To: ceph-users > Subject: What to expect on rejoining a host to cluster? > > Hi all, > > I have had a host with 16 OSDs, each 14TB in capacity that started having > hardware issues causing it to crash. I took this host down 2 weeks ago, > and the data rebalanced to the remaining 11 server hosts in the Ceph > cluster over this time period. > > My initial goal was to then remove the host completely from the cluster > with `ceph osd rm XX` and `ceph osd purge XX` (Adding/Removing OSDs — Ceph > Documentation > <https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/>< > https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/>). > However, I found that after the large amount of data migration from the > recovery, that the purge and removal from the crush map for an OSDs still > required another large data move. It appears that it would have been a > better strategy to assign a 0 weight to an OSD to have only a single larger > data move instead of twice. > > I'd like to join the downed server back into the Ceph cluster. It still > has 14 OSDs that are listed as out/down that would be brought back online. > My question is what can I expect if I bring this host online? Will the > OSDs of a host that has been offline for an extended period of time and out > of the cluster have PGs that are now quite different or inconsistent? Will > this be problematic? > > Thanks for any advice, > Matt > > -- > Matt Larson, PhD > Madison, WI 53705 U.S.A. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx> > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx> > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto: > ceph-users-leave@xxxxxxx> > > > > Eneko Lacunza > Zuzendari teknikoa | Director técnico > Binovo IT Human Project > > Tel. +34 943 569 206 | https://www.binovo.es > Astigarragako Bidea, 2 - 2º > <https://www.google.com/maps/search/Astigarragako+Bidea,+2+-+2%C2%BA?entry=gmail&source=g> > izda. Oficina 10-11, 20180 Oiartzun > > https://www.youtube.com/user/CANALBINOVO > https://www.linkedin.com/company/37269706/ > > > -- > Matt Larson, PhD > Madison, WI 53705 U.S.A. > -- Matt Larson, PhD Madison, WI 53705 U.S.A. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx