Re: Replacing a failed OSD

Jim Kilborn <jim@xxxxxxxxxxxx> · Thu, 15 Sep 2016 12:45:00 +0000

Nick/Dennis,

Thanks for the info. I did fiddle with a location script that would determine whether the drive is a spinning or ssd drive, and put it in the appropriate bucket. I might move back to that now that I understand ceph better.

Thanks for the link to the sample script as well.

Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

From: Nick Fisk<mailto:nick@xxxxxxxxxx>
Sent: Thursday, September 15, 2016 3:40 AM
To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx>; 'Reed Dier'<mailto:reed.dier@xxxxxxxxxxx>
Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
Subject: RE:  Replacing a failed OSD

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Jim Kilborn
> Sent: 14 September 2016 20:30
> To: Reed Dier <reed.dier@xxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx
> Subject: Re:  Replacing a failed OSD
>
> Reed,
>
>
>
> Thanks for the response.
>
>
>
> Your process is the one that I ran. However, I have a crushmap with ssd and sata drives in different buckets (host made up of host
> types, with and ssd and spinning hosttype for each host) because I am using ssd drives for a replicated cache in front of an
erasure
> code data for cephfs.
>
>
>
> I have "osd crush update on start = false" so that osds don't randomly get added to the crush map, because it wouldn't know where
> to put that osd.
>
>
>
> I am using puppet to provision the drives when it sees one in a slot and it doesn't see the ceph signature (I guess). I am using
the ceph
> puppet module.
>
>
>
> The real confusion is why I have to remove it from the crush map. Once I remove it from the crush map, it does bring it up as the
same
> osd number, but its not in the crush map, so I have to put it back where it belongs. Just seems strange that it must be removed
from
> the crush map.
>
>
>
> Basically, I export the crush map, remove the osd from the crush map, then redeploy the drive. Then when it gets up and running as
> the same osd number, I import the exported crush map to get it back in the cluster.
>
>
>
> I guess that is just how it has to be done.

You can pass a script in via the 'osd crush location hook' variable so that the OSD's automatically get placed in the right location
when they startup. Thanks to Wido there is already a script that you can probably use with very few modifications:

https://gist.github.com/wido/5d26d88366e28e25e23d

>
>
>
> Thanks again
>
>
>
> Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
>
>
>
> From: Reed Dier<mailto:reed.dier@xxxxxxxxxxx>
> Sent: Wednesday, September 14, 2016 1:39 PM
> To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx>
> Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>
> Subject: Re:  Replacing a failed OSD
>
>
>
> Hi Jim,
>
> This is pretty fresh in my mind so hopefully I can help you out here.
>
> Firstly, the crush map will back fill any holes in the enumeration that are existing. So assuming only one drive has been removed
from
> the crush map, it will repopulate the same OSD number.
>
> My steps for removing an OSD are run from the host node:
>
> > ceph osd down osd.i
> > ceph osd out osd.i
> > stop ceph-osd id=i
> > umount /var/lib/ceph/osd/ceph-i
> > ceph osd crush remove osd.i
> > ceph auth del osd.i
> > ceph osd rm osd.i
>
>
> From here, the disk is removed from the ceph cluster, crush map, and is ready for removal and replacement.
>
> From there I deploy the new osd with ceph-deploy from my admin node using:
>
> > ceph-deploy disk list nodei
> > ceph-deploy disk zap nodei:sdX
> > ceph-deploy --overwrite-conf osd prepare nodei:sdX
>
>
> This will prepare the disk and insert it back into the crush map, bringing it back up and in. The OSD number should remain the
same, as
> it will fill the gap left from the previous OSD removal.
>
> Hopefully this helps,
>
> Reed
>
> > On Sep 14, 2016, at 11:00 AM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote:
> >
> > I am finishing testing our new cephfs cluster and wanted to document a failed osd procedure.
> > I noticed that when I pulled a drive, to simulate a failure, and run through the replacement steps, the osd has to be removed
from
> the crushmap in order to initialize the new drive as the same osd number.
> >
> > Is this correct that I have to remove it from the crushmap, then after the osd is initialized, and mounted, add it back to the
crush
> map? Is there no way to have it reuse the same osd # without removing if from the crush map?
> >
> > Thanks for taking the time..
> >
> >
> > -          Jim
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com