Re: osds on 2 nodes vs. on one node

"Deneau, Tom" <tom.deneau@xxxxxxx> · Thu, 3 Sep 2015 15:39:06 +0000

Rewording to remove confusion...

Config 1: set up a cluster with 1 node with 6 OSDs
Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each

In each case I do the following:
   1) rados bench write --no-cleanup the same number of 4M size objects
   2) drop caches on all osd nodes
   3) rados bench seq  -t 4 to sequentially read the objects
      and record the read bandwidth

Rados bench is running on a separate client, not on an OSD node.
The client has plenty of spare CPU power and the network and disk
utilization are not limiting factors.

With Config 1, I see approximately 70% more sequential read bandwidth than with Config 2.

In both cases the primary OSDs of the objecgts appear evenly distributed across OSDs.

Yes, replication factor is 2 but since we are only measuring read performance,
I don't think that matters. 

Question is whether there is a ceph parameter that might be throttling the
2 node configuation?

-- Tom

> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Wednesday, September 02, 2015 7:29 PM
> To: ceph-users
> Cc: Deneau, Tom
> Subject: Re:  osds on 2 nodes vs. on one node
> 
> 
> Hello,
> 
> On Wed, 2 Sep 2015 22:38:12 +0000 Deneau, Tom wrote:
> 
> > In a small cluster I have 2 OSD nodes with identical hardware, each
> > with
> > 6 osds.
> >
> > * Configuration 1:  I shut down the osds on one node so I am using 6
> > OSDS on a single node
> >
> Shut down how?
> Just a "service blah stop" or actually removing them from the cluster aka
> CRUSH map?
> 
> > * Configuration 2:  I shut down 3 osds on each node so now I have 6
> > total OSDS but 3 on each node.
> >
> Same as above.
> And in this case even more relevant, because just shutting down random OSDs
> on both nodes would result in massive recovery action at best and more likely
> a broken cluster.
> 
> > I measure read performance using rados bench from a separate client node.
> Default parameters?
> 
> > The client has plenty of spare CPU power and the network and disk
> > utilization are not limiting factors. In all cases, the pool type is
> > replicated so we're just reading from the primary.
> >
> Replicated as in size 2?
> We can guess/assume that from your cluster size, but w/o you telling us or
> giving us all the various config/crush outputs that is only a guess.
> 
> > With Configuration 1, I see approximately 70% more bandwidth than with
> > configuration 2.
> 
> Never mind that bandwidth is mostly irrelevant in real life, which bandwidth,
> read or write?
> 
> > In general, any configuration where the osds span 2 nodes gets poorer
> > performance but in particular when the 2 nodes have equal amounts of
> > traffic.
> >
> 
> Again, guessing from what you're actually doing this isn't particular
> surprising.
> Because with a single node, default rules and replication of 2 your OSDs
> never have to replicate anything when it comes to writes.
> Whereas with 2 nodes replication happens and takes more time (latency) and
> might also saturate your network (we have of course no idea how your cluster
> looks like).
> 
> Christian
> 
> > Is there any ceph parameter that might be throttling the cases where
> > osds span 2 nodes?
> >
> > -- Tom Deneau, AMD
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
> 
> 
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   	Global OnLine Japan/Fusion Communications
> http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com