Re: Best way to add OSDs - whole node or one by one?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm sure there is a "correct" way, but I think it mostly relates to how busy your cluster is, and how tolerant it is of the added load from the backfills.

My current modus operandi is to set the noin, noout, nobackfill, norecover, and norebalance flags first.
This makes sure that new OSDs don't come in, current OSDs don't go out, it doesn't start backfilling or try to rebalance (yet).

Add all of my OSDs.

Then unset noin and norebalance.
In all of the new OSDs.
Let it work out the new crush map so that data isn't constantly in motion moving back and forth as new OSD hosts are added.
Inject osd_max_backfills and osd_recovery_max_active to 1
Then unset norecover and nobackfill and noout.

Then it should slowly but surely chip away at recovery.
During times of lighter load I can ratchet up the max backfills and recovery max actives to a higher level to chug through more of it while iops aren't being burned.

I'm sure everyone has their own way, but I've been very comfortable with this approach over the last few years.

NOTE: you probably want to make sure that the balancer and the pg_autoscaler are set to off during this, otherwise they might throw backfills on the pile and you will feel like you'll never reach the bottom.

Reed

> On Mar 10, 2021, at 9:55 AM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
> 
> Hello,
> 
> I am currently in the process of expanding my Nautilus cluster from 3 nodes (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes.  The old and new OSD nodes all have 8 x 12TB HDDs plus NVMe.   The front and back networks are 10GB.
> 
> Last Friday evening I injected a whole new OSD node, increasing the OSD HDDs from 24 to 32.  As of this morning the cluster is still re-balancing - with periodic warnings about degraded PGs and missed deep-scrub deadlines.   So after 4.5 days my misplaced PGs are down from 33% to 2%.
> 
> My question:  For a cluster of this size, what is the best-practice procedure for adding OSDs?  Should I use 'ceph-volume prepare' to layout the new OSDs, but only add them a couple at a time, or should I continue adding whole nodes?
> 
> Maybe this has to do with a maximum percentage of misplaced PGs. The first new node increased the OSD capacity by 33% and resulted in 33% PG misplacement.  The next node will only result in 25% misplacement.  If a too high percentage of misplaced PGs negatively impacts rebalancing or data availability, what is a reasonable ceiling for this percentage?
> 
> Thanks.
> 
> -Dave
> 
> -- 
> Dave Hall
> Binghamton University
> kdhall@xxxxxxxxxxxxxx
> 607-760-2328 (Cell)
> 607-777-4641 (Office)
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux