On Tue, 30 Dec 2014 08:22:01 +1000 Lindsay Mathieson wrote: > On Mon, 29 Dec 2014 11:29:11 PM Christian Balzer wrote: > > Reads will scale up (on a cluster basis, individual clients might > > not benefit as much) linearly with each additional "device" (host/OSD). > > I'm taking that to mean individual clients as a whole will be limited by > the speed of individual OSD's, but multiple clients will spread their > reads between multiple OSD's, leading to a higher aggregate bandwidth > than individual disks could sustain. > A single client like a VM or an application (see rados bench threads) might of course do things in parallel, too. And thus benefit from accessing mulitple OSDs on multiple nodes at the same time. However a client doing a single, sequential read won't improve much of course (the fact that there are more OSDs with less spindel competition may still help though). > I guess the limiting factor there would be network. > For bandwidth/throughput, most likely and certainly in your case. But bandwidth really tends to become very quickly the least of your concerns, IOPS is where bottlenecks tend to appear first. And there aside from the obvious limitations of your disks (and SSDs) the for most people surprising next bottleneck is the CPU. > > > > Writes will scale up with each additional device divided by replica > > size. > > So adding OSD's will increase write speed from individual clients? OSDs help in and by itself due to the fact that activity can be distributed between more spindles (HDDs). So you can certainly increase the speed of your current storage nodes by adding more OSDs (lets say 4 per node). However increasing the node count and replica size to 3 will not improve things, rather the opposite. Because simply put in that configuration each node will have to do the same things as the others, plus overhead and limitations imposed things like the network. Once you add a 4th node, things speed up again. > seq writes go out to different OSD's simultaneously? > Unless there are multiple threads, no. But given the default object size of 4MB, they go to different OSDs sequentially and rather quickly so. > > > > Fun fact, if you have 1 node with replica 1 and add 2 more identical > > nodes and increase the replica to 3, your write performance will be > > less than 50% of the single node. > > Interesting - this seems to imply that writes go to the replica OSD's > one after another, rather than simultaneously like I expected. > There is a graphic on the Ceph documentation: http://ceph.com/docs/master/architecture/#smart-daemons-enable-hyperscale The numbering of the requests suggests sequential operation, but even if the primary OSD sends the data to the secondary one(s) in parallel your network bandwidth and LATENCY as well as the activity on those nodes and OSDs will of course delay things when compared to just a single, local write. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Fusion Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com