Nick/Dennis, Thanks for the info. I did fiddle with a location script that would determine whether the drive is a spinning or ssd drive, and put it in the appropriate bucket. I might move back to that now that I understand ceph better. Thanks for the link to the sample script as well. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Nick Fisk<mailto:nick@xxxxxxxxxx> Sent: Thursday, September 15, 2016 3:40 AM To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx>; 'Reed Dier'<mailto:reed.dier@xxxxxxxxxxx> Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> Subject: RE: Replacing a failed OSD > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Jim Kilborn > Sent: 14 September 2016 20:30 > To: Reed Dier <reed.dier@xxxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: Replacing a failed OSD > > Reed, > > > > Thanks for the response. > > > > Your process is the one that I ran. However, I have a crushmap with ssd and sata drives in different buckets (host made up of host > types, with and ssd and spinning hosttype for each host) because I am using ssd drives for a replicated cache in front of an erasure > code data for cephfs. > > > > I have "osd crush update on start = false" so that osds don't randomly get added to the crush map, because it wouldn't know where > to put that osd. > > > > I am using puppet to provision the drives when it sees one in a slot and it doesn't see the ceph signature (I guess). I am using the ceph > puppet module. > > > > The real confusion is why I have to remove it from the crush map. Once I remove it from the crush map, it does bring it up as the same > osd number, but its not in the crush map, so I have to put it back where it belongs. Just seems strange that it must be removed from > the crush map. > > > > Basically, I export the crush map, remove the osd from the crush map, then redeploy the drive. Then when it gets up and running as > the same osd number, I import the exported crush map to get it back in the cluster. > > > > I guess that is just how it has to be done. You can pass a script in via the 'osd crush location hook' variable so that the OSD's automatically get placed in the right location when they startup. Thanks to Wido there is already a script that you can probably use with very few modifications: https://gist.github.com/wido/5d26d88366e28e25e23d > > > > Thanks again > > > > Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 > > > > From: Reed Dier<mailto:reed.dier@xxxxxxxxxxx> > Sent: Wednesday, September 14, 2016 1:39 PM > To: Jim Kilborn<mailto:jim@xxxxxxxxxxxx> > Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> > Subject: Re: Replacing a failed OSD > > > > Hi Jim, > > This is pretty fresh in my mind so hopefully I can help you out here. > > Firstly, the crush map will back fill any holes in the enumeration that are existing. So assuming only one drive has been removed from > the crush map, it will repopulate the same OSD number. > > My steps for removing an OSD are run from the host node: > > > ceph osd down osd.i > > ceph osd out osd.i > > stop ceph-osd id=i > > umount /var/lib/ceph/osd/ceph-i > > ceph osd crush remove osd.i > > ceph auth del osd.i > > ceph osd rm osd.i > > > From here, the disk is removed from the ceph cluster, crush map, and is ready for removal and replacement. > > From there I deploy the new osd with ceph-deploy from my admin node using: > > > ceph-deploy disk list nodei > > ceph-deploy disk zap nodei:sdX > > ceph-deploy --overwrite-conf osd prepare nodei:sdX > > > This will prepare the disk and insert it back into the crush map, bringing it back up and in. The OSD number should remain the same, as > it will fill the gap left from the previous OSD removal. > > Hopefully this helps, > > Reed > > > On Sep 14, 2016, at 11:00 AM, Jim Kilborn <jim@xxxxxxxxxxxx> wrote: > > > > I am finishing testing our new cephfs cluster and wanted to document a failed osd procedure. > > I noticed that when I pulled a drive, to simulate a failure, and run through the replacement steps, the osd has to be removed from > the crushmap in order to initialize the new drive as the same osd number. > > > > Is this correct that I have to remove it from the crushmap, then after the osd is initialized, and mounted, add it back to the crush > map? Is there no way to have it reuse the same osd # without removing if from the crush map? > > > > Thanks for taking the time.. > > > > > > - Jim > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com