Re: Ceph Tuning + KV backend

Jan Schermer <jan@xxxxxxxxxxx> · Wed, 9 Sep 2015 11:24:53 +0200

You actually can't know what the network contention is like - you see virtual NICs, but those are overprovisioned on the physical hosts, and the backbone between AWS racks/datacenters are overprovisioned as well (likely).
The same goes for CPU and RAM - depending on your kernel and how AWS is set up, it might look like the CPUs in guests are idle, because the workload has left your domain (guest), but the host might be struggling at the same time without you knowing. Sometimes this shows as "steal" time, sometimes it does not. Your guest can be totally idle because it sent the data "out" (to the virtual drive cache, to the network buffers via DMA) but the host still has work to do at that moment.

Rerun the test at different times of day, or create the same setup in a different AWS zone and compare the results.

Not sure what the AWS service level settings are, if there is some kind of resource reservation that's exactly what you should do to get meaningful numbers.

If you want to identify the bottlenecks you need to test all the metrics at the same time - at least latency (ping, or arping if in the same network/subnet) test between the nodes on all networks, some minimalistic fio read+write test on the virtual disk, and a latency test (cyclictest from rt-tests for example) for the CPUs. Whatever jumps up is the bottleneck.

Jan

> On 08 Sep 2015, at 21:00, Niels Jakob Darger <jakob@xxxxxxxxxx> wrote:
> 
> Hello,
> 
> Excuse my ignorance, I have just joined this list and started using Ceph (which looks very cool). On AWS I have set up a 5-way Ceph cluster (4 vCPUs, 32G RAM, dedicated SSDs for system, osd and journal) with the Object Gateway. For the purpose of simplicity of the test all the nodes are identical and each node contains osd, mon and the radosgw.
> 
> I have run parallel inserts from all 5 nodes, I can insert about 10-12000 objects per minute. The insert rate is relatively constant regardless of whether I run 1 insert process per node or 5, i.e. a total of 5 or 25.
> 
> These are just numbers, of course, and not meaningful without more context. But looking at the nodes I think the cluster could run faster - the CPUs are not doing much, there isn't much I/O wait - only about 50% utilisation and only on the SSDs storing the journals on two of the nodes (I've set the replication to 2), the other file systems are almost idle. The network is far from maxed out and the processes are not using much memory. I've tried increasing osd_op_threads to 5 or 10 but that didn't make much difference.
> 
> The co-location of all the daemons on all the nodes may not be ideal, but since there isn't much resource use or contention I don't think that's the problem.
> 
> So two questions:
> 
> 1) Are there any good resources on tuning Ceph? There's quite a few posts out there testing and timing specific setups with RAID controller X and 12 disks of brand Y etc. but I'm more looking for general tuning guidelines - explaining the big picture.
> 
> 2) What's the status of the keyvalue backend? The documentation on http://ceph.com/docs/master/rados/configuration/keyvaluestore-config-ref/ looks nice but I found it difficult to work out how to switch to the keyvalue backend, the Internet suggests "osd objectstore = keyvaluestore-dev", but that didn't seem to work so I checked out the source code and it looks like "osd objectstore = keyvaluestore" does it. However, it results in nasty things in the log file ("*** experimental feature 'keyvaluestore' is not enabled *** This feature is marked as experimental ...") so perhaps it's too early to use the KV backend for production use?
> 
> Thanks & regards,
> Jakob
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com