Re: Best way to add OSDs - whole node or one by one?

Caspar Smit <casparsmit@xxxxxxxxxxx> · Mon, 15 Mar 2021 08:57:25 +0100

Hi,

I thought the balancer and pg_autoscaler are only doing something if all
the PG's are in active+clean state?
So if there is any backfilling going around it just bails out.

Or did you mean during the norecover/nobackfill/noout phase?

Kind regards,
Caspar

Op do 11 mrt. 2021 om 23:54 schreef Reed Dier <reed.dier@xxxxxxxxxxx>:

> I'm sure there is a "correct" way, but I think it mostly relates to how
> busy your cluster is, and how tolerant it is of the added load from the
> backfills.
>
> My current modus operandi is to set the noin, noout, nobackfill,
> norecover, and norebalance flags first.
> This makes sure that new OSDs don't come in, current OSDs don't go out, it
> doesn't start backfilling or try to rebalance (yet).
>
> Add all of my OSDs.
>
> Then unset noin and norebalance.
> In all of the new OSDs.
> Let it work out the new crush map so that data isn't constantly in motion
> moving back and forth as new OSD hosts are added.
> Inject osd_max_backfills and osd_recovery_max_active to 1
> Then unset norecover and nobackfill and noout.
>
> Then it should slowly but surely chip away at recovery.
> During times of lighter load I can ratchet up the max backfills and
> recovery max actives to a higher level to chug through more of it while
> iops aren't being burned.
>
> I'm sure everyone has their own way, but I've been very comfortable with
> this approach over the last few years.
>
> NOTE: you probably want to make sure that the balancer and the
> pg_autoscaler are set to off during this, otherwise they might throw
> backfills on the pile and you will feel like you'll never reach the bottom.
>
> Reed
>
> > On Mar 10, 2021, at 9:55 AM, Dave Hall <kdhall@xxxxxxxxxxxxxx> wrote:
> >
> > Hello,
> >
> > I am currently in the process of expanding my Nautilus cluster from 3
> nodes (combined OSD/MGR/MON/MDS) to 6 OSD nodes and 3 management nodes.
> The old and new OSD nodes all have 8 x 12TB HDDs plus NVMe.   The front and
> back networks are 10GB.
> >
> > Last Friday evening I injected a whole new OSD node, increasing the OSD
> HDDs from 24 to 32.  As of this morning the cluster is still re-balancing -
> with periodic warnings about degraded PGs and missed deep-scrub deadlines.
>  So after 4.5 days my misplaced PGs are down from 33% to 2%.
> >
> > My question:  For a cluster of this size, what is the best-practice
> procedure for adding OSDs?  Should I use 'ceph-volume prepare' to layout
> the new OSDs, but only add them a couple at a time, or should I continue
> adding whole nodes?
> >
> > Maybe this has to do with a maximum percentage of misplaced PGs. The
> first new node increased the OSD capacity by 33% and resulted in 33% PG
> misplacement.  The next node will only result in 25% misplacement.  If a
> too high percentage of misplaced PGs negatively impacts rebalancing or data
> availability, what is a reasonable ceiling for this percentage?
> >
> > Thanks.
> >
> > -Dave
> >
> > --
> > Dave Hall
> > Binghamton University
> > kdhall@xxxxxxxxxxxxxx
> > 607-760-2328 (Cell)
> > 607-777-4641 (Office)
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx