On 03.09.2013, at 16:27, Sage Weil <sage@xxxxxxxxxxx> wrote: >> ceph osd create # this should give you back the same osd number as the one >> you just removed > > OSD=`ceph osd create` # may or may not be the same osd id good point - so far it has been good to us! > >> >> umount ${PART}1 >> parted $PART rm 1 # remove partion and create a new one >> parted $PART mkpart primary 0% 100% # remove partion and create a new one > > I don't think the partition removal/add step is needed. it isn't - I'm still learning the ropes :) > > Otherwise it looks fine! ok - I have tried a simplified version (that doesn't take the OSD out) that just "simulates" a disk failure (i.e.. stops the OSD, reformats the drive, recreates the OSD structure and starts the process again). This (seems) to work, but is really slow in rebuilding the disk (we see write speed of 4-20 MB/s - and it takes ages to refill around 100GB of data) I don't dare to run this on multiple OSDs a the same time for fear of loosing data, so the "slower/longer" process of first marking all OSDs of a server as out, waiting for them to empty and then batch formatting all OSDs on the server and waiting for the cluster to be stable again, might be faster in the end cheers jc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com