Re: Switching failure domains

David Turner <drakonstein@xxxxxxxxx> · Thu, 01 Feb 2018 01:46:17 +0000

I don't know if a non-impactful way to change this. If any host, rack, etc IDs change it will cause movement. If any crush rule changes where it chooses from our the failure domain, it will cause movement.
I once ran a test cluster where I changed every host to be in its own "rack" just to change the rule to choose from racks instead of hosts and it moved all of the data even though the actual size of the failed domains didn't change.
In Luminous if you only have HDDs and you change the crush rule to choose from default class HDD, even though the osds all stayed where they were and nothing else changed in the map, the majority of your data will move.
Afaik, all changes to the crush map like this will move all affected data around.

On Wed, Jan 31, 2018, 12:57 PM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
We're looking into switching the failure domains on several of our

clusters from host-level to rack-level and I'm trying to figure out the

least impactful way to accomplish this.

First off, I've made this change before on a couple large (500+ OSDs)

OpenStack clusters where the volumes, images, and vms pools were all

about 33% of the cluster.  The way I did it then was to create a new

rule which had a switch-based failure domain and then did one pool at a

time.

That worked pretty well, but now I've inherited several large RGW

clusters (500-1000+ OSDs) where 99% of the data is in the .rgw.buckets

pool with slower and bigger disks (7200 RPM 4TB SATA HDDs vs. the 10k

RPM 1.2TB SAS HDDs I was using previously).  This makes the change take

longer and early testing has shown it being fairly impactful.

I'm wondering if there is a way to more gradually switch to a rack-based

failure domain?

One of the ideas we had was to create new hosts that are actually the

racks and gradually move all the OSDs to those hosts.  Once that is

complete we should be able to turn those hosts into racks and switch the

failure domain at the same time.

Does anyone see a problem with that approach?

I was also wondering if we could take advantage of RGW in any way to

gradually move the data to a new pool with the proper failure domain set

on it?

BTW, these clusters will all be running jewel (10.2.10).  The time I

made the switch previously was done on hammer.

Thanks,

Bryan

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com