Re: Drive replacement procedure

John Nielsen <lists@xxxxxxxxxxxx> · Mon, 24 Jun 2013 11:41:20 -0600

On Jun 24, 2013, at 11:22 AM, Brian Candler <b.candler@xxxxxxxxx> wrote:

> I'm just finding my way around the Ceph documentation. What I'm hoping to build are servers with 24 data disks and one O/S disk. From what I've read, the recommended configuration is to run 24 separate OSDs (or 23 if I have a separate journal disk/SSD), and not have any sort of in-server RAID.
> 
> Obviously, disks are going to fail - and the documentation acknowledges this.
> 
> What I'm looking for is a documented procedure for replacing a failed disk, but so far I have not been able to find one. Can you point me at the right place please?
> 
> I'm looking for something step-by-step and as idiot-proof as possible :-)

The official documentation is maybe not %100 idiot-proof, but it is step-by-step:

http://ceph.com/docs/master/rados/operations/add-or-rm-osds/

If you lose a disk you want to remove the OSD associated with it. This will trigger a data migration so you are back to full redundancy as soon as it finishes. Whenever you get a replacement disk, you will add an OSD for it (the same as if you were adding an entirely new disk). This will also trigger a data migration so the new disk will be utilized immediately.

If you have a spare or replacement disk immediately after a disk goes bad, you could maybe save some data migration by doing the removal and re-adding within a short period of time, but otherwise "drive replacement" looks exactly like retiring an OSD and adding a new one that happens to use the same drive slot.

JN

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com