Re: osds on 2 nodes vs. on one node

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 03 Sep 2015 11:16:37 -0500

On 09/03/2015 10:39 AM, Deneau, Tom wrote:
Rewording to remove confusion...

Config 1: set up a cluster with 1 node with 6 OSDs
Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each

In each case I do the following:
    1) rados bench write --no-cleanup the same number of 4M size objects
    2) drop caches on all osd nodes
    3) rados bench seq  -t 4 to sequentially read the objects
       and record the read bandwidth

Rados bench is running on a separate client, not on an OSD node.
The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors.

With Config 1, I see approximately 70% more sequential read bandwidth than with Config 2.

Out of curiosity have you tried 6 OSDs just on the 2nd node?

In both cases the primary OSDs of the objecgts appear evenly distributed across OSDs.

Yes, replication factor is 2 but since we are only measuring read performance,
I don't think that matters.

Question is whether there is a ceph parameter that might be throttling the
2 node configuation?

It sounds kind of like some kind of network wonkiness, but who knows. 
Maybe try some concurrent network communication tests from the OSDs to 
the clients just to make sure there isn't something strange going on 
with both OSD nodes send data to the client concurrently.

What's the behavior like over time?  Is throughput on the fast setup 
stable?  Is the slow setup spikey?  Consistently low?  How's the latency 
spread in each case?

-- Tom

-----Original Message-----
From: Christian Balzer [mailto:chibi@xxxxxxx]
Sent: Wednesday, September 02, 2015 7:29 PM
To: ceph-users
Cc: Deneau, Tom
Subject: Re:  osds on 2 nodes vs. on one node

Hello,

On Wed, 2 Sep 2015 22:38:12 +0000 Deneau, Tom wrote:

In a small cluster I have 2 OSD nodes with identical hardware, each
with
6 osds.

* Configuration 1:  I shut down the osds on one node so I am using 6
OSDS on a single node

Shut down how?
Just a "service blah stop" or actually removing them from the cluster aka
CRUSH map?

* Configuration 2:  I shut down 3 osds on each node so now I have 6
total OSDS but 3 on each node.

Same as above.
And in this case even more relevant, because just shutting down random OSDs
on both nodes would result in massive recovery action at best and more likely
a broken cluster.

I measure read performance using rados bench from a separate client node.
Default parameters?

The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors. In all cases, the pool type is
replicated so we're just reading from the primary.

Replicated as in size 2?
We can guess/assume that from your cluster size, but w/o you telling us or
giving us all the various config/crush outputs that is only a guess.

With Configuration 1, I see approximately 70% more bandwidth than with
configuration 2.

Never mind that bandwidth is mostly irrelevant in real life, which bandwidth,
read or write?

In general, any configuration where the osds span 2 nodes gets poorer
performance but in particular when the 2 nodes have equal amounts of
traffic.

Again, guessing from what you're actually doing this isn't particular
surprising.
Because with a single node, default rules and replication of 2 your OSDs
never have to replicate anything when it comes to writes.
Whereas with 2 nodes replication happens and takes more time (latency) and
might also saturate your network (we have of course no idea how your cluster
looks like).

Christian

Is there any ceph parameter that might be throttling the cases where
osds span 2 nodes?

-- Tom Deneau, AMD
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com