Re: Reducing ceph cluster size in half

Frank Schilder <frans@xxxxxx> · Tue, 22 Feb 2022 08:54:23 +0000

There is also another option that I like to use:

- create a disjoint new crush root
- set nodown, nouut, norebalance, nobackfill
- possibly insert PG merging here; see below
- osd crush move the hosts to the new root
- wait for peering to finish
- unset nodown, nouut, norebalance, nobackfill

This will avoid *any* duplicate data movement. Setting the crush weight to 0 and then removing the OSDs after draining *will* lead to a second data movement after removing the OSDs. The crush placements will change again (at least it happened on my cluster when using the weight=0 drain+remove procedure), because these OSDs are still in the crush root and influence the hashing algorithm.

In principle you should be able to do *everything* in one go by doing the PG merging in the step I indicated above. Then you only have a couple of peering storms and exactly one data movement into the final locations.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Matt Vandermeulen <storage@xxxxxxxxxxxx>
Sent: 21 February 2022 23:31:49
To: Jason Borden
Cc: ceph-users@xxxxxxx
Subject:  Re: Reducing ceph cluster size in half

This might be easiest to work about in two steps:  Draining hosts, and
doing a PG merge.  You can do it in either order (though thinking about
it, doing the merge first will give you more cluster-wide resources to
do it faster).

Draining the hosts can be done in a few ways, too.  If you want to do it
in one shot, you can set nobackfill, then set the crush/reweights for
the OSDs to zero, let the peering storm settle, and unset nobackfill.
This is probably the easiest option if a brief peering storm and
backfill_wait isn't a concern.

If you want to reduce backfill_wait PGs, you can use something like
`pgremapper drain`, but this will likely involve multiple data
movements:  The initial drain is fine, but the CRUSH removal of hosts
will cause the upmaps to be lost (which can be `pgremapper
cancel-backfill` away).  Additional data movement will be needed if you
want to `pgremapper undo-upmaps` to clean up what was canceled (or if
you use the balancer and it wants to move things).

On 2022-02-21 17:58, Jason Borden wrote:
> Hi all,
>
> I'm looking for some advice on reducing my ceph cluster in half. I
> currently have 40 hosts and 160 osds on a cephadm managed pacific
> cluster. The storage space is only 12% utilized. I want to reduce the
> cluster to 20 hosts and 80 osds while keeping the cluster operational.
> I'd prefer to do this in as few operations as possible instead of
> draining each host at a time and having to rebalance pgs 20 times. I
> think I should probably half the number of pgs at the same time too.
> Does anyone have any advice on how I can safely achieve this?
>
> Thanks,
> Jason
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx