Good morning, We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the past 20 months. We're very happy with our experience with the platform so far. Shortly, we will be embarking on an initiative to replace all 88 OSDs with new drives (Planned maintenance and lifecycle replacement). Before we do so, however, I wanted to confirm with the community as to the proper order of operation to perform such a task. The OSDs are divided evenly across an even number of hosts which are then divided evenly between 2 cabinets in 2 physically separate locations. The plan is to replace the OSDs, one host at a time, cycling back and forth between cabinets, replacing one host per week, or every 2 weeks (Depending on the amount of time the crush rebalancing takes). For each host, the plan was to mark the OSDs as out, one at a time, closely monitoring each of them, moving to the next OSD one the current one is balanced out. Once all OSDs are successfully marked as out, we will then delete those OSDs from the cluster, shutdown the server, replace the physical drives, and once rebooted, add the new drives to the cluster as new OSDs using the same method we've used previously, doing so one at a time to allow for rebalancing as they rejoin the cluster. My questions are…Does this process sound correct? Should I also mark the OSDs as down when I mark them as out? Are there any steps I'm overlooking in this process? Any advice is greatly appreciated. Cheers, -
Stephen Mercier | Sr. Systems Architect Attainia Capital Planning Solutions (ACPS) O: (650)241-0567, 727 | TF: (866)288-2464, 727 |
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com