If you want your data to be N+2 redundant (able to handle 2 failures, more or less), then you need to set size=3 and have 3 replicas of your data. If you want your monitors to be N+2 redundant, then you need 5 monitors. If you feel that your data is worth size=3, then you should really try to have 5 monitors. Unless you're building a cluster with <5 servers, of course. This is common to pretty much every quorum-based system in existence, not just Ceph. In my experience, 1 replica is fine for test instances that have no expectation of data persistence or availability, 3 replicas is okay for small instances that don't need any sort of strong availability guarantee, and 5 replicas is really where you need to be for any sort of large-scale production use. I've been stuck using 3-way replicated quorum systems in large-scale production systems, and it made any sort of planned maintenance absolutely terrifying. Or really any back-end outage at all, because you're left operating completely without a net. Any additional failure and the service craters spectacularly and publicly. Since I really hate reading newspaper articles about outages in my systems, I use 5-way quorums whenever possible. Scott On Sat Aug 30 2014 at 7:40:18 PM Joao Eduardo Luis <joao.luis at inktank.com> wrote: > Nigel mistakenly replied just to me, CC'ing the list. > > On 08/30/2014 08:12 AM, Nigel Williams wrote: > > On Sat, Aug 30, 2014 at 11:59 AM, Joao Eduardo Luis > > <joao.luis at inktank.com> wrote: > >> But yeah, if you're going with 2 or 4, you'll be better off with 3 or > 5. As > >> long as you don't go with 1 you should be okay. > > > > On a recent panel discussion one member strongly advocated 5 as the > > minimum number of MONs for a large Ceph deployment. Large in this case > > was PBs of storage. > > > > For a Ceph cluster with 100s of OSDs and 100s of TB across multiple > > racks (therefore many paths involved) is 5 x MONs a good rule-of-thumb > > or is three sufficient? > > Whoever stated that was probably right. I don't often like to speak > about what works best for (really) large deployments as I don't often > see them. In theory, 5 monitors will fare better than 3 for 100s of OSDs. > > As far as the monitors are concerned, this will be so mostly because 5 > monitors are able to serve more maps concurrently than 3 monitors would. > I don't think we have tests to back my reasoning here, but I don't > think that the cluster workload or its size would have that much of an > impact on the number of monitors. Albeit a technical detail, the fact > is that every message that an OSD would send to a monitor that would > trigger an update to a map is *always* forwarded to the leader monitor. > This means that regardless of how many monitors you have, you'll > always end up with the same monitor dealing with the map updates and > that always puts a cap on map update throughput -- this is not that big > of a deal, usually, and knobs may be adjusted if need be. > > On the other hand, given you have 5 monitors instead of 3 means that > you'll be able to spread OSD connections throughout more monitors, and > even if updates are forwarded to the leader, connection-wise the load is > more spread out -- the message is forwarded by the monitor the OSD > connects to, and said monitor will act as a proxy in replying to the > OSD, so there's less hammering the leader directly. > > But the point where this actually may make a real difference is in > serving osdmap updates. So, the OSDs need those updates. Even > considering that OSDs will share maps amongst themselves, they still > need to get them from somewhere -- and that somewhere is the monitor > cluster. If you have 100s of OSDs connected to just 3 monitors, each > monitor will end up serving bunches of reads (sending map updates to > OSDs) while dealing with messages that will trigger map updates (which > will in turn be forwarded to the leader). Given that any client (OSDs > included) connect to monitors at random at start and maintain that > connection for a while, a "rule of thumb" would tell us that the leader > would be responsible for serving 1/3 of all map reads while still > handling map updates. Having 5 monitors would reduce this load to 1/5. > > However, I don't know of a good indicator to whether a given cluster > should go with 5 monitors instead of 3. Or 7 monitors instead of 5. I > don't think there are many clusters running 7 monitors, but it may so be > that for even larger clusters, having 5 or 7 monitors serving updates > makes up for the increased number of messages required to commit an > update -- keep in mind that due to Paxos nature one always needs an ack > for an update from at least (N+1)/2 monitors. Again, this is twofold: > we may have more messages being passed around, but given each monitor is > under a lower load we may even get to them faster. > > I think I went a bit offtrack. > > Let me know if this led to further confusion instead. > > -Joao > > > -- > Joao Eduardo Luis > Software Engineer | http://inktank.com | http://ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users at lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140831/5e11196e/attachment.htm>