On 10/26/2016 02:12 PM, Gandalf
Corvotempesta wrote:
2016-10-26 23:07 GMT+02:00 Joe Julian <joe@xxxxxxxxxxxxxxxx>:And yes, they can fail, but 20TB is small enough to heal pretty quickly.20TB small enough to build quickly? On which network? Gluster doesn't have a dedicated cluster network, if the cluster is being hevily accessed, the healing will slow down everything else (or everything else will slow down the healing) Quickly = MTTR is within tolerances to continue to meet SLA. It's just math. As for a dedicated heal network, split-horizon dns handles that just fine. Clients resolve a server's hostname to the "eth1" (for example) address and the servers themselves resolve the same hostname to the "eth0" address. We played with bonding but decided against the complexity. Anyway, you can heal quickly, but I still prefere to have data safe on each node. If you start with 3 server at once, probably each disk is coming from the same batch, thus a massive disks failure is easy to get. There's preference and there's engineering to meet requirements. If your SLA is 5 nines and you engineer 6 nines, you may realize that the difference between a 99.99993% uptime and a 99.99997% uptime isn't worth the added expense of doing replication and raid-1. If you loose only 2 disks, one for each server, from the same replica group, you are game over. With RAID6, you have to loose 5 disks from the same replica group. I never loose my drives. They're always firmly attached. :P With 300 drives, 60 bricks, replica 3 (across 3 racks), I have a six nines availability for any one replica subvolume. If you really want to fudge the numbers, the reliability for any given file is not worth calculating in that volume. The odds of all three bricks failing for any 1 file among 20 distribute subvolumes is statistically infinitesimal. In my environment, I can create 4 RAID-0 on each server (3 disks on each RAID0), or 2 RAID-6 with 6 disks each, or 1 RAID-6 with 12 disks or 1 RAID-7 with 12 disks (RAID-7 with less than 12 disks is non-sense) I don't know which one is better. Just do the reliability calculations and engineer a storage system to meet (exceed) your obligations within the available budget. http://www.eventhelix.com/realtimemantra/faulthandling/system_reliability_availability.htm |
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users