Re: Replacing a failed OSD disk drive (or replace XFS with BTRFS)

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Sat, 21 Mar 2015 08:28:22 -0600

When you reformat the drive, it generates a new UUID so to Ceph it is as if it was a brand new drive. This does seem heavy handed, but ceph was designed for things to fail and it is not unusual to do things this way. Ceph is not RAID so you usually have to do some unthinking. 
You could probably keep the UUID and the Auth key between reformats, but in my experience if is so easy to just have Ceph regenerate it, it's not worth the hassle of trying to keep track of it all. 
In our testing we formatted the cluster over a dozen times without losing data. Because there wasn't much data on it we were able to format 40 OSDs in under 30 minutes (we formatted a while host at a time because we knew that was safe ) with a few little online scripts. 
Short answer is don't be afraid to do it this way. 
Robert LeBlanc
Sent from a mobile device please excuse any typos.
On Mar 21, 2015 5:11 AM, "Datatone Lists" <lists@xxxxxxxxxxxxxx> wrote:
I have been experimenting with Ceph, and have some OSDs with drives

containing XFS filesystems which I want to change to BTRFS.

(I started with BTRFS, then started again from scratch with XFS

[currently recommended] in order to eleminate that as a potential cause

of some issues, now with further experience, I want to go back to

BTRFS, but have data in my cluster and I don't want to scrap it).

This is exactly equivalent to the case in which I have an OSD with a

drive that I see is starting to error. I would in that case need to

replace the drive and recreate the Ceph structures on it.

So, I mark the OSD out, and the cluster automatically eliminates its

notion of data stored on the OSD and creates copies of the affected PGs

elsewhere to make the cluster healthy again.

All of the disk replacement instructions that I see then tell me to

then follow an OSD removal process:

"This procedure removes an OSD from a cluster map, removes its

authentication key, removes the OSD from the OSD map, and removes the

OSD from the ceph.conf file".

This seems to me to be too heavy-handed. I'm worried about doing this

and then effectively adding a new OSD where I have the same id number

as the OSD that I apparently unnecessarily removed.

I don't actually want to remove the OSD. The OSD is fine, I just want

to replace the disk drive that it uses.

This suggests that I really want to take the OSD out, allow the cluster

to get healthy again, then (replace the disk if this is due to

failure,) create a new BTRFS/XFS filesystem, remount the drive, then

recreate the Ceph structures on the disk to be compatible with the old

disk and the original OSD that it was attached to.

The OSD then gets marked back in, the cluster says "hello again, we

missed you, but its good to see you back, here are some PGs ...".

What I'm saying is that I really don't want to destroy the OSD, I want

to refresh it with a new disk/filesystem and put it back to work.

Is there some fundamental reason why this can't be done? If not, how

should I do it?

Best regards,

David

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com