Re: osds on 2 nodes vs. on one node

Gregory Farnum <gfarnum@xxxxxxxxxx> · Tue, 8 Sep 2015 14:25:22 +0100



On Fri, Sep 4, 2015 at 12:24 AM, Deneau, Tom <tom.deneau@xxxxxxx> wrote:
> After running some other experiments, I see now that the high single-node
> bandwidth only occurs when ceph-mon is also running on that same node.
> (In these small clusters I only had one ceph-mon running).
> If I compare to a single-node where ceph-mon is not running, I see
> basically identical performance to the two-node arrangement.
>
> So now my question is:  Is it expected that there would be such
> a large performance difference between using osds on a single node
> where ceph-mon is running vs. using osds on a single node where
> ceph-mon is not running?

No. There's clearly some kind of weird confound going on here.
Honestly my first thought (I haven't heard of anything like this
before) is that you might want to look at the power-saving profile of
your nodes. Maybe the extra load of the monitor is keeping the CPU
awake or something...
-Greg

>
> -- Tom
>
>> -----Original Message-----
>> From: Deneau, Tom
>> Sent: Thursday, September 03, 2015 10:39 AM
>> To: 'Christian Balzer'; ceph-users
>> Subject: RE:  osds on 2 nodes vs. on one node
>>
>> Rewording to remove confusion...
>>
>> Config 1: set up a cluster with 1 node with 6 OSDs Config 2: identical
>> hardware, set up a cluster with 2 nodes with 3 OSDs each
>>
>> In each case I do the following:
>>    1) rados bench write --no-cleanup the same number of 4M size objects
>>    2) drop caches on all osd nodes
>>    3) rados bench seq  -t 4 to sequentially read the objects
>>       and record the read bandwidth
>>
>> Rados bench is running on a separate client, not on an OSD node.
>> The client has plenty of spare CPU power and the network and disk utilization
>> are not limiting factors.
>>
>> With Config 1, I see approximately 70% more sequential read bandwidth than
>> with Config 2.
>>
>> In both cases the primary OSDs of the objecgts appear evenly distributed
>> across OSDs.
>>
>> Yes, replication factor is 2 but since we are only measuring read
>> performance, I don't think that matters.
>>
>> Question is whether there is a ceph parameter that might be throttling the
>> 2 node configuation?
>>
>> -- Tom
>>
>> > -----Original Message-----
>> > From: Christian Balzer [mailto:chibi@xxxxxxx]
>> > Sent: Wednesday, September 02, 2015 7:29 PM
>> > To: ceph-users
>> > Cc: Deneau, Tom
>> > Subject: Re:  osds on 2 nodes vs. on one node
>> >
>> >
>> > Hello,
>> >
>> > On Wed, 2 Sep 2015 22:38:12 +0000 Deneau, Tom wrote:
>> >
>> > > In a small cluster I have 2 OSD nodes with identical hardware, each
>> > > with
>> > > 6 osds.
>> > >
>> > > * Configuration 1:  I shut down the osds on one node so I am using 6
>> > > OSDS on a single node
>> > >
>> > Shut down how?
>> > Just a "service blah stop" or actually removing them from the cluster
>> > aka CRUSH map?
>> >
>> > > * Configuration 2:  I shut down 3 osds on each node so now I have 6
>> > > total OSDS but 3 on each node.
>> > >
>> > Same as above.
>> > And in this case even more relevant, because just shutting down random
>> > OSDs on both nodes would result in massive recovery action at best and
>> > more likely a broken cluster.
>> >
>> > > I measure read performance using rados bench from a separate client node.
>> > Default parameters?
>> >
>> > > The client has plenty of spare CPU power and the network and disk
>> > > utilization are not limiting factors. In all cases, the pool type is
>> > > replicated so we're just reading from the primary.
>> > >
>> > Replicated as in size 2?
>> > We can guess/assume that from your cluster size, but w/o you telling
>> > us or giving us all the various config/crush outputs that is only a guess.
>> >
>> > > With Configuration 1, I see approximately 70% more bandwidth than
>> > > with configuration 2.
>> >
>> > Never mind that bandwidth is mostly irrelevant in real life, which
>> > bandwidth, read or write?
>> >
>> > > In general, any configuration where the osds span 2 nodes gets
>> > > poorer performance but in particular when the 2 nodes have equal
>> > > amounts of traffic.
>> > >
>> >
>> > Again, guessing from what you're actually doing this isn't particular
>> > surprising.
>> > Because with a single node, default rules and replication of 2 your
>> > OSDs never have to replicate anything when it comes to writes.
>> > Whereas with 2 nodes replication happens and takes more time (latency)
>> > and might also saturate your network (we have of course no idea how
>> > your cluster looks like).
>> >
>> > Christian
>> >
>> > > Is there any ceph parameter that might be throttling the cases where
>> > > osds span 2 nodes?
>> > >
>> > > -- Tom Deneau, AMD
>> > > _______________________________________________
>> > > ceph-users mailing list
>> > > ceph-users@xxxxxxxxxxxxxx
>> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> > >
>> >
>> >
>> > --
>> > Christian Balzer        Network/Systems Engineer
>> > chibi@xxxxxxx       Global OnLine Japan/Fusion Communications
>> > http://www.gol.com/
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com