Re: Advice on OSD upgrades

Wido den Hollander <wido@xxxxxxxx> · Thu, 14 Apr 2016 16:00:33 +0200 (CEST)

> Op 14 april 2016 om 15:29 schreef Stephen Mercier
> <stephen.mercier@xxxxxxxxxxxx>:
> 
> 
> Good morning,
> 
> We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for the
> past 20 months. We're very happy with our experience with the platform so far.
> 
> Shortly, we will be embarking on an initiative to replace all 88 OSDs with new
> drives (Planned maintenance and lifecycle replacement). Before we do so,
> however, I wanted to confirm with the community as to the proper order of
> operation to perform such a task.
> 
> The OSDs are divided evenly across an even number of hosts which are then
> divided evenly between 2 cabinets in 2 physically separate locations. The plan
> is to replace the OSDs, one host at a time, cycling back and forth between
> cabinets, replacing one host per week, or every 2 weeks (Depending on the
> amount of time the crush rebalancing takes).
> 

I assume that your replication is set to "2" and that you replicate over the two
locations?

In that case, only work on HDDs in the first location and start on the second
one after you replaced them all.

> For each host, the plan was to mark the OSDs as out, one at a time, closely
> monitoring each of them, moving to the next OSD one the current one is
> balanced out. Once all OSDs are successfully marked as out, we will then
> delete those OSDs from the cluster, shutdown the server, replace the physical
> drives, and once rebooted, add the new drives to the cluster as new OSDs using
> the same method we've used previously, doing so one at a time to allow for
> rebalancing as they rejoin the cluster.
> 
> My questions are…Does this process sound correct? Should I also mark the OSDs
> as down when I mark them as out? Are there any steps I'm overlooking in this
> process?
> 

No, marking out is just fine. That tells CRUSH the OSD is no longer
participating in the data placement. It's effective weight will be 0 and that's
it.

Like others mention, reweight the OSD to 0 at the same time you mark it as out.
That way you prevent a double rebalance.

Keep it marked as UP so that it can help in migrating the PGs to other nodes.

> Any advice is greatly appreciated.
> 
> Cheers,
> -
> Stephen Mercier | Sr. Systems Architect
> Attainia Capital Planning Solutions (ACPS)
> O: (650)241-0567, 727 | TF: (866)288-2464, 727
> stephen.mercier@xxxxxxxxxxxx | www.attainia.com
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com