2016-10-26 23:38 GMT+02:00 Joe Julian <joe@xxxxxxxxxxxxxxxx>: > Quickly = MTTR is within tolerances to continue to meet SLA. It's just math. Obviously yes. But in the real world, you can have the best SLAs in the world, but if you loose data, you loose customers. > As for a dedicated heal network, split-horizon dns handles that just fine. > Clients resolve a server's hostname to the "eth1" (for example) address and > the servers themselves resolve the same hostname to the "eth0" address. We > played with bonding but decided against the complexity. Good Idea. Thanks. In this was, the cluster network is serparated from the client network, like with ceph. Just a question: you need two dns infrastructure for this, right ? ns1 and ns2 used by client pointing to eth0 and ns3 and ns4 used by gluster pointing to eth1. In small environment the hosts file could be used, but I prefere the DNS way. > There's preference and there's engineering to meet requirements. If your SLA > is 5 nines and you engineer 6 nines, you may realize that the difference > between a 99.99993% uptime and a 99.99997% uptime isn't worth the added > expense of doing replication and raid-1. How to you calculate the number of nines in this environment ? In example, to have 6 nines (for availability and data consistency), which configuration should I adopt ? I can have 6 nines for the whole cluster but 2 nines for data. In the first case, the whole cluster can't go totally down (tons of node, as example), in the second, some data could be lost (replica 1 or 2) > With 300 drives, 60 bricks, replica 3 (across 3 racks), I have a six nines > availability for any one replica subvolume. If you really want to fudge the > numbers, the reliability for any given file is not worth calculating in that > volume. The odds of all three bricks failing for any 1 file among 20 > distribute subvolumes is statistically infinitesimal. How many servers ? 300 drives, bought in a very short time are willing to fail quicky with multiple failure per time. I had 2 drive failures in less than 1 hour some month ago. Hopefully I was using a RAID-6 Both drives was from the same manufacturer and with sequential serial number. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users