Replication does not occur until the OSD is “out.” This creates a new mapping in the cluster of where the PGs should be and thus data begins to move and/or create sufficient copies. This scheme lets you control how and when you want the replication to occur. If you have plenty of space and you aren’t going to replace the drive immediately, just mark the OSD “down" AND “out.". If you are going to replace the drive immediately, set the “noout” flag. Take the OSD “down” and replace drive. Assuming it is mounted in the same place as the bad drive, bring the OSD back up. This will replicate exactly the same PGs the bad drive held back to the replacement drive. As was stated before don’t forget to “ceph osd unset noout" Keep in mind that in the case of a machine that has a hardware failure and takes OSD(s) down there is an automatic timeout which will mark them “out" for unattended operation. Unless you are monitoring the cluster 24/7 you should have enough disk space available to handle failures. Related info in: On Nov 15, 2013, at 1:58 AM, Mihály Árva-Tóth <mihaly.arva-toth@xxxxxxxxxxxxxxxxxxxxxx> wrote:
|
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com