Re: Basic questions

John Wilkins <john.wilkins@xxxxxxxxxxx> · Fri, 26 Jul 2013 12:59:13 -0700

(a) This is true when using ceph-deploy for a cluster. It's one Ceph
Monitor for the cluster on one node. You can have many Ceph monitors,
but the typical high availability cluster has 3-5 monitor nodes. With
a manual install, you could conceivably install multiple monitors onto
a single node for the same cluster, but this isn't a best practice
since the node is a failure domain. The monitor is part of the
cluster, not the node. So you can have thousands of nodes running Ceph
daemons that are members of the cluster "ceph." A node that has a
monitor for cluster "ceph" will monitor all Ceph OSD daemons and MDS
daemons across those thousands of nodes. That same node could also
have a monitor for cluster "deep-storage" or whatever cluster name you
choose.

(b) I'm actually working on a reference architecture for Calxeda that
is asking exactly that question. My personal feeling is that having a
machine/host/chassis optimized for a particular purpose (e.g., running
Ceph OSDs) is the ideal scenario, since you can just add hardware to
the cluster to expand it. You don't need to add monitors or MDSs to
add OSDs. So my personal opinion is that it's an ideal approach. The
upcoming Calxeda offerings provide excellent value in the
cost/performance tradeoff. You get a lot of storage density and good
performance. High performance clusters--e.g., using SSDs for journals,
having more RAM and CPU power--cost more, but you still have some of
the same issues. I still don't have a firm opinion on this, but my gut
tells me that OSDs should be separate from the other daemons--build
OSD hosts with dense storage. The fsync issues with the
kernel--running monitors and OSDs on the same host--generally lead to
performance issues. See
http://ceph.com/docs/master/rados/troubleshooting/troubleshooting-osd/#osds-are-slow-unresponsive
for examples of why you may run into performance issues making
different types of processes co-resident on the same host.  Processes
like monitors shouldn't be co-resident with OSDs. So you don't have
wasted hosts with light weight processes like Ceph monitors, it may be
ideal to place your MDS daemons, Apache/RGW daemons,
OpenStack/CloudStack, and/or VMs on those nodes. You need to consider
the CPU, RAM, disk i/o and network implications of co-resident
applications.

(d) If you have three monitors, Paxos will still work. 2 out of 3
monitors is a majority. A failure of a monitor means it's down, but
not out. If it were out of the cluster, then the cluster would assume
only two monitors, which wouldn't work with Paxos. That's why 3
monitors is the minimum for high availability. 4 works too, because 3
out of 4 is a majority too. Some people like using an odd number of
monitors, since you never have an equal number of monitors that are
up/down; however, this isn't a requirement for Paxos. 3 out of 4 and 3
out of 5 both constitute a majority.

On Fri, Jul 26, 2013 at 11:29 AM, Hariharan Thantry <thantry@xxxxxxxxx> wrote:
> Hi John,
>
> Thanks for the responses.
>
> For (a), I remember reading somewhere that one can only run a max of 1
> monitor/node, I assume that that implies the single monitor process will be
> responsible for ALL ceph clusters on that node, correct?
>
> So (b) isn't really a Ceph issue, that's nice to know. Any recommendations
> on the minimum kernel/glibc version and min RAM size requirements where Ceph
> can be run on a single client in native mode? Reason I ask this is in a few
> deployment scenarios (especially non-standard like telco platforms),
> hardware gets added gradually, so its more important to be able to scale the
> cluster out gracefully. I actually see Ceph as an alternative to SAN, using
> JBODs from machines to create a larg(ish) storage cluster. Plus, usually,
> the clients would probably be running on the same hardware as the OSD/MON,
> because space on the chassis is at a premium.
>
> (d) I was thinking about single node failure scenarios, with 3 nodes,
> wouldn't a failure of 1 node cause PAXOS to not work?
>
>
>
> Thanks,
> Hari
>
>
>
>
>
> On Fri, Jul 26, 2013 at 10:00 AM, John Wilkins <john.wilkins@xxxxxxxxxxx>
> wrote:
>>
>> (a) Yes. See
>> http://ceph.com/docs/master/rados/configuration/ceph-conf/#running-multiple-clusters
>> and
>> http://ceph.com/docs/master/rados/deployment/ceph-deploy-new/#naming-a-cluster
>> (b) Yes. See
>> http://wiki.ceph.com/03FAQs/01General_FAQ#How_Can_I_Give_Ceph_a_Try.3F
>>  Mounting kernel modules on the same node as Ceph Daemons can cause
>> older kernels to deadlock.
>> (c) Someone else can probably answer that better than me.
>> (d) At least three. Paxos requires a simple majority, so 2 out of 3 is
>> sufficient. See
>> http://ceph.com/docs/master/rados/configuration/mon-config-ref/#background
>> particularly the monitor quorum section.
>>
>> On Wed, Jul 24, 2013 at 4:03 PM, Hariharan Thantry <thantry@xxxxxxxxx>
>> wrote:
>> > Hi folks,
>> >
>> > Some very basic questions.
>> >
>> > (a) Can I be running more than 1 ceph cluster on the same node (assume
>> > that
>> > I have no more than 1 monitor/node, but storage is contributed by one
>> > node
>> > into more than 1 cluster)
>> > (b) Are there any issues with running Ceph clients on the same node as
>> > the
>> > other Ceph storage cluster entities (OSD/MON?)
>> > (c) Is the best way to access Ceph storage cluster in native mode by
>> > multiple clients through hosting a shared-disk filesystem on top of the
>> > RBD
>> > (like OCFS2?). What if these clients were running inside VMs? Could one
>> > then
>> > create independent partitions on top of rbd and give a partition to each
>> > of
>> > the VMs?
>> > (d) Isn't the realistic minimum for # of monitors in a cluster at least
>> > 4
>> > (to guard against one failure?)
>> >
>> >
>> > Thanks,
>> > Hari
>> >
>> >
>> >
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >
>>
>>
>>
>> --
>> John Wilkins
>> Senior Technical Writer
>> Intank
>> john.wilkins@xxxxxxxxxxx
>> (415) 425-9599
>> http://inktank.com
>
>

-- 
John Wilkins
Senior Technical Writer
Intank
john.wilkins@xxxxxxxxxxx
(415) 425-9599
http://inktank.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com