We recently elected to rebuild a cluster from bottom up ( move all data off and move back ). I used cephadm to create the new cluster and then compared it with another cluster of similar size that had been migrated/adopted into cephadm. I see lots of differences. I gather this is due to the adoption process, but I cant find any information on "finishing" the adoption process in a way that makes the two indistinguishable. When I add new drives on the rebuilt cluster ( say cluster A ), as soon as the drive populates in the OS, the cluster picks it up and adds it. On the migrated cluster( say cluster B, the drive just sits there until I manually add it. I had to manually create all the crash, performance, alert and node-exporter services on Cluster B. They were already on and setup on Cluster A. I just used labels to deploy them for placement and * for node/crash. On cluster B, it shows the osd service as unmanaged and when I add new hosts, it creates a service for each host e.g. "osd.dashboard-cephdashboard-randomnumbershere" with the host name in the placement field. Why wouldn't the import process create one server and mark it so that all osds are included instead of saying unmanaged ( did I miss something ? ). The other thing I have had an issue with was that the imported monitors seem to be unstable. The only way to fix them is to delete the assignment and then kill any containers left that stay running ( had this happen for the mon and mgr's on two imported servers ). The imported nodes were fighting for control of the mgr primary, I figured this out after half the disks in the cluster were marked as out/down when I rebooted the two mon/mgr nodes during a ram upgrade of the esxi hosts. We were cycling through each esxi server they are spread across. I had to halt the ram upgrade while the cluster was fixed. After the rebuild, the instances are no longer fighting for control. The only way to get a handle on the situation was keep the two adopted nodes offline till I could kill their instance rights in the ceph dashboard ( probably could have done it via command too ). The last item is an odd one, because the grafana server is a different node than the mgr, all the performance graphs show an page error on cluster B. In cluster A, it's also on a different node, but the pages show. For cluster B, I have to navigate to the grafana server directly in the browser and accept the certificate risk in order to get the graphs to show up. I would imagine a valid cert would fix this, but I don't see any information in injecting that into the container ( maybe my googlefu is failing me ). I know how to do it on docker/podman with container start up ymls. Thoughts? Is there a good guide out there for doing an adoption and fully flushing out all the settings to best practices? Assuming the cephadm spin up script is using best practices. Regards, -Brent Existing Clusters: Test: Quincy 17.2.3 ( all virtual on nvme ) US Production(HDD): Octopus 15.2.16 with 11 osd servers, 3 mons, 4 gateways, 2 iscsi gateways UK Production(HDD): Nautilus 14.2.22 with 18 osd servers, 3 mons, 4 gateways, 2 iscsi gateways US Production(SSD): Quincy 17.2.3 Cephadm with 6 osd servers, 5 mons, 4 gateways UK Production(SSD): Quincy 17.2.3 cephadm with 7 osd servers, 5 mons, 4 gateways _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx