We're looking into switching the failure domains on several of our clusters from host-level to rack-level and I'm trying to figure out the least impactful way to accomplish this. First off, I've made this change before on a couple large (500+ OSDs) OpenStack clusters where the volumes, images, and vms pools were all about 33% of the cluster. The way I did it then was to create a new rule which had a switch-based failure domain and then did one pool at a time. That worked pretty well, but now I've inherited several large RGW clusters (500-1000+ OSDs) where 99% of the data is in the .rgw.buckets pool with slower and bigger disks (7200 RPM 4TB SATA HDDs vs. the 10k RPM 1.2TB SAS HDDs I was using previously). This makes the change take longer and early testing has shown it being fairly impactful. I'm wondering if there is a way to more gradually switch to a rack-based failure domain? One of the ideas we had was to create new hosts that are actually the racks and gradually move all the OSDs to those hosts. Once that is complete we should be able to turn those hosts into racks and switch the failure domain at the same time. Does anyone see a problem with that approach? I was also wondering if we could take advantage of RGW in any way to gradually move the data to a new pool with the proper failure domain set on it? BTW, these clusters will all be running jewel (10.2.10). The time I made the switch previously was done on hammer. Thanks, Bryan _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com