(a) This is true when using ceph-deploy for a cluster. It's one Ceph Monitor for the cluster on one node. You can have many Ceph monitors, but the typical high availability cluster has 3-5 monitor nodes. With a manual install, you could conceivably install multiple monitors onto a single node for the same cluster, but this isn't a best practice since the node is a failure domain. The monitor is part of the cluster, not the node. So you can have thousands of nodes running Ceph daemons that are members of the cluster "ceph." A node that has a monitor for cluster "ceph" will monitor all Ceph OSD daemons and MDS daemons across those thousands of nodes. That same node could also have a monitor for cluster "deep-storage" or whatever cluster name you choose. (b) I'm actually working on a reference architecture for Calxeda that is asking exactly that question. My personal feeling is that having a machine/host/chassis optimized for a particular purpose (e.g., running Ceph OSDs) is the ideal scenario, since you can just add hardware to the cluster to expand it. You don't need to add monitors or MDSs to add OSDs. So my personal opinion is that it's an ideal approach. The upcoming Calxeda offerings provide excellent value in the cost/performance tradeoff. You get a lot of storage density and good performance. High performance clusters--e.g., using SSDs for journals, having more RAM and CPU power--cost more, but you still have some of the same issues. I still don't have a firm opinion on this, but my gut tells me that OSDs should be separate from the other daemons--build OSD hosts with dense storage. The fsync issues with the kernel--running monitors and OSDs on the same host--generally lead to performance issues. See http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#osds-are-slow-unresponsive for examples of why you may run into performance issues making different types of processes co-resident on the same host. Processes like monitors shouldn't be co-resident with OSDs. So you don't have wasted hosts with light weight processes like Ceph monitors, it may be ideal to place your MDS daemons, Apache/RGW daemons, OpenStack/CloudStack, and/or VMs on those nodes. You need to consider the CPU, RAM, disk i/o and network implications of co-resident applications. (d) If you have three monitors, Paxos will still work. 2 out of 3 monitors is a majority. A failure of a monitor means it's down, but not out. If it were out of the cluster, then the cluster would assume only two monitors, which wouldn't work with Paxos. That's why 3 monitors is the minimum for high availability. 4 works too, because 3 out of 4 is a majority too. Some people like using an odd number of monitors, since you never have an equal number of monitors that are up/down; however, this isn't a requirement for Paxos. 3 out of 4 and 3 out of 5 both constitute a majority. On Fri, Jul 26, 2013 at 11:29 AM, Hariharan Thantry <thantry@xxxxxxxxx> wrote: > Hi John, > > Thanks for the responses. > > For (a), I remember reading somewhere that one can only run a max of 1 > monitor/node, I assume that that implies the single monitor process will be > responsible for ALL ceph clusters on that node, correct? > > So (b) isn't really a Ceph issue, that's nice to know. Any recommendations > on the minimum kernel/glibc version and min RAM size requirements where Ceph > can be run on a single client in native mode? Reason I ask this is in a few > deployment scenarios (especially non-standard like telco platforms), > hardware gets added gradually, so its more important to be able to scale the > cluster out gracefully. I actually see Ceph as an alternative to SAN, using > JBODs from machines to create a larg(ish) storage cluster. Plus, usually, > the clients would probably be running on the same hardware as the OSD/MON, > because space on the chassis is at a premium. > > (d) I was thinking about single node failure scenarios, with 3 nodes, > wouldn't a failure of 1 node cause PAXOS to not work? > > > > Thanks, > Hari > > > > > > On Fri, Jul 26, 2013 at 10:00 AM, John Wilkins <john.wilkins@xxxxxxxxxxx> > wrote: >> >> (a) Yes. See >> http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters >> and >> http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/#naming-a-cluster >> (b) Yes. See >> http://wiki.ceph.com/03FAQs/01General_FAQ#How_Can_I_Give_Ceph_a_Try.3F >> Mounting kernel modules on the same node as Ceph Daemons can cause >> older kernels to deadlock. >> (c) Someone else can probably answer that better than me. >> (d) At least three. Paxos requires a simple majority, so 2 out of 3 is >> sufficient. See >> http://ceph.com/docs/master/rados/configuration/mon-config-ref/#background >> particularly the monitor quorum section. >> >> On Wed, Jul 24, 2013 at 4:03 PM, Hariharan Thantry <thantry@xxxxxxxxx> >> wrote: >> > Hi folks, >> > >> > Some very basic questions. >> > >> > (a) Can I be running more than 1 ceph cluster on the same node (assume >> > that >> > I have no more than 1 monitor/node, but storage is contributed by one >> > node >> > into more than 1 cluster) >> > (b) Are there any issues with running Ceph clients on the same node as >> > the >> > other Ceph storage cluster entities (OSD/MON?) >> > (c) Is the best way to access Ceph storage cluster in native mode by >> > multiple clients through hosting a shared-disk filesystem on top of the >> > RBD >> > (like OCFS2?). What if these clients were running inside VMs? Could one >> > then >> > create independent partitions on top of rbd and give a partition to each >> > of >> > the VMs? >> > (d) Isn't the realistic minimum for # of monitors in a cluster at least >> > 4 >> > (to guard against one failure?) >> > >> > >> > Thanks, >> > Hari >> > >> > >> > >> > _______________________________________________ >> > ceph-users mailing list >> > ceph-users@xxxxxxxxxxxxxx >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> >> >> >> -- >> John Wilkins >> Senior Technical Writer >> Intank >> john.wilkins@xxxxxxxxxxx >> (415) 425-9599 >> http://inktank.com > > -- John Wilkins Senior Technical Writer Intank john.wilkins@xxxxxxxxxxx (415) 425-9599 http://inktank.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com