Re: Advice on OSD upgrades

Stephen Mercier <stephen.mercier@xxxxxxxxxxxx> · Thu, 14 Apr 2016 07:01:01 -0700

Sadly, this is not an option. Not only are there no free slots on the hosts, but we're downgrading in size for each OSD because we decided to sacrifice space to make a significant jump in drive quality. 
We're not really too concerned about the rebalancing, as we monitor the cluster closely and have the available breathing-room to withstand the impact as long as we're methodical and measured about it.

Cheers,

-Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.mercier@xxxxxxxxxxxx | www.attainia.com

On Apr 14, 2016, at 6:45 AM, koukou73gr wrote:

If you have empty drive slots in your OSD hosts, I'd be tempted to
insert new drive in slot, set noout, shutdown one OSD, unmount OSD
directory, dd the old drive to the new one, remove old drive, restart OSD.

No rebalancing and minimal data movment when the OSD rejoins.

-K.

On 04/14/2016 04:29 PM, Stephen Mercier wrote:
Good morning,

We've been running a medium-sized (88 OSDs - all SSD) ceph cluster for
the past 20 months. We're very happy with our experience with the
platform so far.

Shortly, we will be embarking on an initiative to replace all 88 OSDs
with new drives (Planned maintenance and lifecycle replacement). Before
we do so, however, I wanted to confirm with the community as to the
proper order of operation to perform such a task.

The OSDs are divided evenly across an even number of hosts which are
then divided evenly between 2 cabinets in 2 physically separate
locations. The plan is to replace the OSDs, one host at a time, cycling
back and forth between cabinets, replacing one host per week, or every 2
weeks (Depending on the amount of time the crush rebalancing takes).

For each host, the plan was to mark the OSDs as out, one at a time,
closely monitoring each of them, moving to the next OSD one the current
one is balanced out. Once all OSDs are successfully marked as out, we
will then delete those OSDs from the cluster, shutdown the server,
replace the physical drives, and once rebooted, add the new drives to
the cluster as new OSDs using the same method we've used previously,
doing so one at a time to allow for rebalancing as they rejoin the cluster.

My questions are…Does this process sound correct? Should I also mark the
OSDs as down when I mark them as out? Are there any steps I'm
overlooking in this process?

Any advice is greatly appreciated.

Cheers,
-
Stephen Mercier | Sr. Systems Architect
Attainia Capital Planning Solutions (ACPS)
O: (650)241-0567, 727 | TF: (866)288-2464, 727
stephen.mercier@xxxxxxxxxxxx <mailto:stephen.mercier@xxxxxxxxxxxx> |
www.attainia.com <http://www.attainia.com>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com