Re: Switching failure domains

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I don't know if a non-impactful way to change this. If any host, rack, etc IDs change it will cause movement. If any crush rule changes where it chooses from our the failure domain, it will cause movement.

I once ran a test cluster where I changed every host to be in its own "rack" just to change the rule to choose from racks instead of hosts and it moved all of the data even though the actual size of the failed domains didn't change.

In Luminous if you only have HDDs and you change the crush rule to choose from default class HDD, even though the osds all stayed where they were and nothing else changed in the map, the majority of your data will move.

Afaik, all changes to the crush map like this will move all affected data around.


On Wed, Jan 31, 2018, 12:57 PM Bryan Stillwell <bstillwell@xxxxxxxxxxx> wrote:
We're looking into switching the failure domains on several of our
clusters from host-level to rack-level and I'm trying to figure out the
least impactful way to accomplish this.

First off, I've made this change before on a couple large (500+ OSDs)
OpenStack clusters where the volumes, images, and vms pools were all
about 33% of the cluster.  The way I did it then was to create a new
rule which had a switch-based failure domain and then did one pool at a
time.

That worked pretty well, but now I've inherited several large RGW
clusters (500-1000+ OSDs) where 99% of the data is in the .rgw.buckets
pool with slower and bigger disks (7200 RPM 4TB SATA HDDs vs. the 10k
RPM 1.2TB SAS HDDs I was using previously).  This makes the change take
longer and early testing has shown it being fairly impactful.

I'm wondering if there is a way to more gradually switch to a rack-based
failure domain?

One of the ideas we had was to create new hosts that are actually the
racks and gradually move all the OSDs to those hosts.  Once that is
complete we should be able to turn those hosts into racks and switch the
failure domain at the same time.

Does anyone see a problem with that approach?

I was also wondering if we could take advantage of RGW in any way to
gradually move the data to a new pool with the proper failure domain set
on it?

BTW, these clusters will all be running jewel (10.2.10).  The time I
made the switch previously was done on hammer.

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux