Hi, I coudn't agree more, but just to re-emphasize what others already said: the point of replica 3 is not to have extra safety for (human|software|server) failures, but to have enough data around to allow rebalancing the cluster when disks fail. after a certain amount of disks in a cluster, you're going to get disks failures all the time. if you don't pay extra attention (and wasting lots and lots of time/money) to carefully arrange/choose disks of different vendor productions lines/dates, simultaneous disk failures happen within minutes. example from our past: on our (at that time small) cluster of 72 disks spread over 6 storage nodes, half of them were seagate enterprice capacity disks, the other half western digitial red pro. for each disk manufacturer, we bought only half of the disks from the same production. so.. we had.. * 18 disks wd, production charge A * 18 disks wd, production charge B * 18 disks seagate, production charge C * 18 disks seagate, production charge D one day, 6 disks failed simultaneously spread over two storage nodes. had we had replica 2, we couldn't recover and would have lost data. instead, because of replica 3, we didn't loose any data and ceph automatically rebalanced all data before further disks were failing. so: if re-creating data stored on the cluster is valuable (because it costs much time and effort to 're-collect' it, or you can't accept the time it takes to restore from backup, or worse to re-create it from scratch), you have to assume that whatever manufacturer/production charge of HDs you're using, they *can* fail all at the same time because you could have hit a faulty production. the only way out here is replica >=3. (of course, the whole MTBF and why raid doesn't scale applies as well) Regards, Daniel _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com