On Jun 24, 2013, at 11:22 AM, Brian Candler <b.candler@xxxxxxxxx> wrote: > I'm just finding my way around the Ceph documentation. What I'm hoping to build are servers with 24 data disks and one O/S disk. From what I've read, the recommended configuration is to run 24 separate OSDs (or 23 if I have a separate journal disk/SSD), and not have any sort of in-server RAID. > > Obviously, disks are going to fail - and the documentation acknowledges this. > > What I'm looking for is a documented procedure for replacing a failed disk, but so far I have not been able to find one. Can you point me at the right place please? > > I'm looking for something step-by-step and as idiot-proof as possible :-) The official documentation is maybe not %100 idiot-proof, but it is step-by-step: http://ceph.com/docs/master/rados/operations/add-or-rm-osds/ If you lose a disk you want to remove the OSD associated with it. This will trigger a data migration so you are back to full redundancy as soon as it finishes. Whenever you get a replacement disk, you will add an OSD for it (the same as if you were adding an entirely new disk). This will also trigger a data migration so the new disk will be utilized immediately. If you have a spare or replacement disk immediately after a disk goes bad, you could maybe save some data migration by doing the removal and re-adding within a short period of time, but otherwise "drive replacement" looks exactly like retiring an OSD and adding a new one that happens to use the same drive slot. JN _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com