Re: Best way to add OSDs - whole node or one by one?

Reed Dier <reed.dier@xxxxxxxxxxx> · Mon, 15 Mar 2021 14:41:25 -0500

They shouldn't, but you can have cases where you will have OSDs/pools that may be active+clean, and it will trip a rebalance, etc while other pools/OSDs are still backfilling, potentially further implicating the backfilling OSDs.

It may be better now, but peace of mind from knowing I won't be chasing a balancer/autoscaler ghost is easy enough by disabling for the expansion, and then re-enable after fully backfilled.

But also helpful during the initial phase, so as to keep things from starting movement before it will get redirected in short time.

Reed

> On Mar 15, 2021, at 2:57 AM, Caspar Smit <casparsmit@xxxxxxxxxxx> wrote:
> 
> Hi,
> 
> I thought the balancer and pg_autoscaler are only doing something if all
> the PG's are in active+clean state?
> So if there is any backfilling going around it just bails out.
> 
> Or did you mean during the norecover/nobackfill/noout phase?
> 
> Kind regards,
> Caspar
> 
> Op do 11 mrt. 2021 om 23:54 schreef Reed Dier <reed.dier@xxxxxxxxxxx>:
> 
>> I'm sure there is a "correct" way, but I think it mostly relates to how
>> busy your cluster is, and how tolerant it is of the added load from the
>> backfills.
>> 
>> My current modus operandi is to set the noin, noout, nobackfill,
>> norecover, and norebalance flags first.
>> This makes sure that new OSDs don't come in, current OSDs don't go out, it
>> doesn't start backfilling or try to rebalance (yet).
>> 
>> Add all of my OSDs.
>> 
>> Then unset noin and norebalance.
>> In all of the new OSDs.
>> Let it work out the new crush map so that data isn't constantly in motion
>> moving back and forth as new OSD hosts are added.
>> Inject osd_max_backfills and osd_recovery_max_active to 1
>> Then unset norecover and nobackfill and noout.
>> 
>> Then it should slowly but surely chip away at recovery.
>> During times of lighter load I can ratchet up the max backfills and
>> recovery max actives to a higher level to chug through more of it while
>> iops aren't being burned.
>> 
>> I'm sure everyone has their own way, but I've been very comfortable with
>> this approach over the last few years.
>> 
>> NOTE: you probably want to make sure that the balancer and the
>> pg_autoscaler are set to off during this, otherwise they might throw
>> backfills on the pile and you will feel like you'll never reach the bottom.
>> 
>> Reed
>> 
>>> On Mar 10, 2021, at 9:55 AM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
>>> 
>>> Hello,
>>> 
>>> I am currently in the process of expanding my Nautilus cluster from 3
>> nodes (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes.
>> The old and new OSD nodes all have 8 x 12TB HDDs plus NVMe.   The front and
>> back networks are 10GB.
>>> 
>>> Last Friday evening I injected a whole new OSD node, increasing the OSD
>> HDDs from 24 to 32.  As of this morning the cluster is still re-balancing -
>> with periodic warnings about degraded PGs and missed deep-scrub deadlines.
>> So after 4.5 days my misplaced PGs are down from 33% to 2%.
>>> 
>>> My question:  For a cluster of this size, what is the best-practice
>> procedure for adding OSDs?  Should I use 'ceph-volume prepare' to layout
>> the new OSDs, but only add them a couple at a time, or should I continue
>> adding whole nodes?
>>> 
>>> Maybe this has to do with a maximum percentage of misplaced PGs. The
>> first new node increased the OSD capacity by 33% and resulted in 33% PG
>> misplacement.  The next node will only result in 25% misplacement.  If a
>> too high percentage of misplaced PGs negatively impacts rebalancing or data
>> availability, what is a reasonable ceiling for this percentage?
>>> 
>>> Thanks.
>>> 
>>> -Dave
>>> 
>>> --
>>> Dave Hall
>>> Binghamton University
>>> kdhall@xxxxxxxxxxxxxx
>>> 607-760-2328 (Cell)
>>> 607-777-4641 (Office)
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx