Re: Best way to add OSDs - whole node or one by one?

Andrew Walker-Brown <andrew_jbrown@xxxxxxxxxxx> · Fri, 12 Mar 2021 08:29:44 +0000

Dave,

Worth just looking at utilisation across your OSD’s. I’ve had Pgs get stuck in backfill-wait-too big when I’ve added new osds. Ceph was unable to move Pg around onto a smaller capacity osd that was quite full.  I had to increase the number of pgs (and pg_num) for it to get sorted (and do some reweighting).  Reeds plan is a good one. 

Because my setup has been in quite a state of flux recently, I’ve kept autoscale set to warn and the number of pgs higher than normal for the short term. 

Cheers

A

Sent from my iPhone

On 12 Mar 2021, at 04:38, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:

Reed,

Thank you.  This seems like a very well thought approach.  Your note about
the balancer and the auto_scaler seem quite relevant as well.  I'll give it
a try when I add my next two nodes.

-Dave

--
Dave Hall
Binghamton University

On Thu, Mar 11, 2021 at 5:53 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote:

> I'm sure there is a "correct" way, but I think it mostly relates to how
> busy your cluster is, and how tolerant it is of the added load from the
> backfills.
> 
> My current modus operandi is to set the noin, noout, nobackfill,
> norecover, and norebalance flags first.
> This makes sure that new OSDs don't come in, current OSDs don't go out, it
> doesn't start backfilling or try to rebalance (yet).
> 
> Add all of my OSDs.
> 
> Then unset noin and norebalance.
> In all of the new OSDs.
> Let it work out the new crush map so that data isn't constantly in motion
> moving back and forth as new OSD hosts are added.
> Inject osd_max_backfills and osd_recovery_max_active to 1
> Then unset norecover and nobackfill and noout.
> 
> Then it should slowly but surely chip away at recovery.
> During times of lighter load I can ratchet up the max backfills and
> recovery max actives to a higher level to chug through more of it while
> iops aren't being burned.
> 
> I'm sure everyone has their own way, but I've been very comfortable with
> this approach over the last few years.
> 
> NOTE: you probably want to make sure that the balancer and the
> pg_autoscaler are set to off during this, otherwise they might throw
> backfills on the pile and you will feel like you'll never reach the bottom.
> 
> Reed
> 
>> On Mar 10, 2021, at 9:55 AM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
>> 
>> Hello,
>> 
>> I am currently in the process of expanding my Nautilus cluster from 3
> nodes (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes.
> The old and new OSD nodes all have 8 x 12TB HDDs plus NVMe.   The front and
> back networks are 10GB.
>> 
>> Last Friday evening I injected a whole new OSD node, increasing the OSD
> HDDs from 24 to 32.  As of this morning the cluster is still re-balancing -
> with periodic warnings about degraded PGs and missed deep-scrub deadlines.
> So after 4.5 days my misplaced PGs are down from 33% to 2%.
>> 
>> My question:  For a cluster of this size, what is the best-practice
> procedure for adding OSDs?  Should I use 'ceph-volume prepare' to layout
> the new OSDs, but only add them a couple at a time, or should I continue
> adding whole nodes?
>> 
>> Maybe this has to do with a maximum percentage of misplaced PGs. The
> first new node increased the OSD capacity by 33% and resulted in 33% PG
> misplacement.  The next node will only result in 25% misplacement.  If a
> too high percentage of misplaced PGs negatively impacts rebalancing or data
> availability, what is a reasonable ceiling for this percentage?
>> 
>> Thanks.
>> 
>> -Dave
>> 
>> --
>> Dave Hall
>> Binghamton University
>> kdhall@xxxxxxxxxxxxxx
>> 607-760-2328 (Cell)
>> 607-777-4641 (Office)
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> 
> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx