question on reusing OSD

John-Paul Robinson <jpr@xxxxxxx> · Tue, 15 Sep 2015 18:21:27 -0500

Hi,

I'm working to correct a partitioning error from when our cluster was
first installed (ceph 0.56.4, ubuntu 12.04).  This left us with 2TB
partitions for our OSDs, instead of the 2.8TB actually available on
disk, a 29% space hit.  (The error was due to a gdisk bug that
mis-computed the end of the disk during the ceph-disk-prepare and placed
the journal at the 2TB mark instead of the true end of the disk at
2.8TB. I've updated gdisk to a newer release that works correctly.)

I'd like to fix this problem by taking my existing 2TB OSDs offline one
at a time, repartitioning them and then bringing them back into the
cluster.  Unfortunately I can't just grow the partitions, so the
repartition will be destructive.

I would like for the reformatted OSD to come back into the cluster
looking just like the original OSD, except that it now has 2.8TB for
it's data.  That is, I'd like the OSD number to stay the same and for
the cluster to think of it like the original disk (save for not having
any data on it).

Ordinarily, I would add an OSD by bringing a system into the cluster
triggering these events:

ceph-disk-prepare /dev/sdb /dev/sdb  # partitions disk, note older
cluster with journal on same disk
ceph-disk-activate /dev/sdb                 # registers osd with cluster

The ceph-disk-prepare is focused on partitioning and doesn't interact
with the cluster.  The ceph-disk-activate takes care of making the OSD
look like an OSD and adding it into the cluster.

Inside of the ceph-disk-activate the code looks for some special files
at the top of the /dev/sdb1 file system, including magic, ceph_fsid, and
whoami (which is where the osd number is stored).

My first question is, can I preserve these special files and put them
back on the repartitioned/formatted drive causing ceph-disk-activate to
just bring the OSD back into the cluster using it's original identity or
is there a better way to do what I want?

My second question is, if I take an OSD out of the cluster, should I
wait for the subsequent rebalance to complete before bringing the
reformatted OSD back in the cluster?  That is, will it cause problems to
drop an OSD out of the cluster and then bring the same OSD back into the
cluster except without any of the data.   I'm assuming this is similar
to what would happen in a standard disk replacement scenario.

I reviewed the thread from Sept 2014
(https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg13394.html)
discussing a similar scenario.  This was more focused on re-using a
journal slot on an SSD.  In my case the journal is on the same disk as
the data.  Also, I don't have a recent release of the ceph so likely
won't benefit from the associated fix.

Thanks for any suggestions.

~jpr
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com