Reed, Thank you. This seems like a very well thought approach. Your note about the balancer and the auto_scaler seem quite relevant as well. I'll give it a try when I add my next two nodes. -Dave -- Dave Hall Binghamton University On Thu, Mar 11, 2021 at 5:53 PM Reed Dier <reed.dier@xxxxxxxxxxx> wrote: > I'm sure there is a "correct" way, but I think it mostly relates to how > busy your cluster is, and how tolerant it is of the added load from the > backfills. > > My current modus operandi is to set the noin, noout, nobackfill, > norecover, and norebalance flags first. > This makes sure that new OSDs don't come in, current OSDs don't go out, it > doesn't start backfilling or try to rebalance (yet). > > Add all of my OSDs. > > Then unset noin and norebalance. > In all of the new OSDs. > Let it work out the new crush map so that data isn't constantly in motion > moving back and forth as new OSD hosts are added. > Inject osd_max_backfills and osd_recovery_max_active to 1 > Then unset norecover and nobackfill and noout. > > Then it should slowly but surely chip away at recovery. > During times of lighter load I can ratchet up the max backfills and > recovery max actives to a higher level to chug through more of it while > iops aren't being burned. > > I'm sure everyone has their own way, but I've been very comfortable with > this approach over the last few years. > > NOTE: you probably want to make sure that the balancer and the > pg_autoscaler are set to off during this, otherwise they might throw > backfills on the pile and you will feel like you'll never reach the bottom. > > Reed > > > On Mar 10, 2021, at 9:55 AM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote: > > > > Hello, > > > > I am currently in the process of expanding my Nautilus cluster from 3 > nodes (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes. > The old and new OSD nodes all have 8 x 12TB HDDs plus NVMe. The front and > back networks are 10GB. > > > > Last Friday evening I injected a whole new OSD node, increasing the OSD > HDDs from 24 to 32. As of this morning the cluster is still re-balancing - > with periodic warnings about degraded PGs and missed deep-scrub deadlines. > So after 4.5 days my misplaced PGs are down from 33% to 2%. > > > > My question: For a cluster of this size, what is the best-practice > procedure for adding OSDs? Should I use 'ceph-volume prepare' to layout > the new OSDs, but only add them a couple at a time, or should I continue > adding whole nodes? > > > > Maybe this has to do with a maximum percentage of misplaced PGs. The > first new node increased the OSD capacity by 33% and resulted in 33% PG > misplacement. The next node will only result in 25% misplacement. If a > too high percentage of misplaced PGs negatively impacts rebalancing or data > availability, what is a reasonable ceiling for this percentage? > > > > Thanks. > > > > -Dave > > > > -- > > Dave Hall > > Binghamton University > > kdhall@xxxxxxxxxxxxxx > > 607-760-2328 (Cell) > > 607-777-4641 (Office) > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx