Rewording to remove confusion... Config 1: set up a cluster with 1 node with 6 OSDs Config 2: identical hardware, set up a cluster with 2 nodes with 3 OSDs each In each case I do the following: 1) rados bench write --no-cleanup the same number of 4M size objects 2) drop caches on all osd nodes 3) rados bench seq -t 4 to sequentially read the objects and record the read bandwidth Rados bench is running on a separate client, not on an OSD node. The client has plenty of spare CPU power and the network and disk utilization are not limiting factors. With Config 1, I see approximately 70% more sequential read bandwidth than with Config 2. In both cases the primary OSDs of the objecgts appear evenly distributed across OSDs. Yes, replication factor is 2 but since we are only measuring read performance, I don't think that matters. Question is whether there is a ceph parameter that might be throttling the 2 node configuation? -- Tom > -----Original Message----- > From: Christian Balzer [mailto:chibi@xxxxxxx] > Sent: Wednesday, September 02, 2015 7:29 PM > To: ceph-users > Cc: Deneau, Tom > Subject: Re: osds on 2 nodes vs. on one node > > > Hello, > > On Wed, 2 Sep 2015 22:38:12 +0000 Deneau, Tom wrote: > > > In a small cluster I have 2 OSD nodes with identical hardware, each > > with > > 6 osds. > > > > * Configuration 1: I shut down the osds on one node so I am using 6 > > OSDS on a single node > > > Shut down how? > Just a "service blah stop" or actually removing them from the cluster aka > CRUSH map? > > > * Configuration 2: I shut down 3 osds on each node so now I have 6 > > total OSDS but 3 on each node. > > > Same as above. > And in this case even more relevant, because just shutting down random OSDs > on both nodes would result in massive recovery action at best and more likely > a broken cluster. > > > I measure read performance using rados bench from a separate client node. > Default parameters? > > > The client has plenty of spare CPU power and the network and disk > > utilization are not limiting factors. In all cases, the pool type is > > replicated so we're just reading from the primary. > > > Replicated as in size 2? > We can guess/assume that from your cluster size, but w/o you telling us or > giving us all the various config/crush outputs that is only a guess. > > > With Configuration 1, I see approximately 70% more bandwidth than with > > configuration 2. > > Never mind that bandwidth is mostly irrelevant in real life, which bandwidth, > read or write? > > > In general, any configuration where the osds span 2 nodes gets poorer > > performance but in particular when the 2 nodes have equal amounts of > > traffic. > > > > Again, guessing from what you're actually doing this isn't particular > surprising. > Because with a single node, default rules and replication of 2 your OSDs > never have to replicate anything when it comes to writes. > Whereas with 2 nodes replication happens and takes more time (latency) and > might also saturate your network (we have of course no idea how your cluster > looks like). > > Christian > > > Is there any ceph parameter that might be throttling the cases where > > osds span 2 nodes? > > > > -- Tom Deneau, AMD > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Fusion Communications > http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com