Re: Performance questions (how original, I know)

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Thu, 19 Dec 2013 17:18:01 -0600

On 12/16/2013 02:42 AM, Christian Balzer wrote:

Hello,

Hi Christian!

new to Ceph, not new to replicated storage.
Simple test cluster with 2 identical nodes running Debian Jessie, thus ceph
0.48. And yes, I very much prefer a distro supported package.

I know you'd like to use the distro package, but 0.48 is positively 
ancient at this point.  There's been a *lot* of fixes/changes since 
then.  If it makes you feel better, our current professionally supported 
release is based on dumpling.

Single mon and osd1 on node a, osd2 on node b.
1GbE direct interlink between the nodes, used exclusively for this setup.
Bog standard, minimum configuration, declaring a journal but that's on the
same backing storage.
The backing storage can do this locally (bonnie++):
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
irt03            8G           89267  21 60474  15           267049  37 536.9  12
Latency                        4792ms     245ms             44908us     113ms

And this with the a 20GB rbd (formatted the same way, ext4, as the test
above) mounted on the node that hosts the mon and osd1:
Version  1.97       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
irt03            8G           11525   2  5562   1           48221   6 167.3   3
Latency                        5073ms    2912ms               321ms    2841ms

I'm looking at Ceph/RBD to store VM volumes with ganeti and these numbers
frankly scare me.
Watching the traffic with ethstats I never saw something higher than this
during writes (on node a):
   eth2:   72.32 Mb/s In   127.99 Mb/s Out -   8035.4 p/s In   11649.5 p/s Out

I assume the traffic coming back in is replica stuff from node b, right?
What prevented it to use more than about 13% of the network link capacity?

Aside from that cringeworthy drop to 15% of the backing storage speed (and
network link) which I presume might be salvageable by using a SSD journal
I'm more than puzzled by the read speed.
For starters I would have assumed that in this 2 replica setup all data is
present on the local node a and Ceph would be smart enough to get it all
locally. But even if it was talking to both nodes a and b (or just b) I
would have expected something in the 100MB/s range.

Ceph typically always reads data from the primary OSD, so wherever the 
primary is located, that's where it will read from.  The good news is 
that this gives you a better probability of spreading yours reads out 
over the whole cluster. The bad news is that you have more network 
traffic to deal with.

Any insights would be much appreciated.

With 0.48 it's kind of tough to make any recommendations because I 
frankly don't remember exactly everything that's changed since then. 
You'll probably want to make sure that syncfs is being used, and you 
probably will want to play around with enabling/disabling the filestore 
flusher and maybe turning journal aio on.  Looks like RBD cache was 
included in 0.46, so you can try enabling that, but it had performance 
issues with sequential writes before cuttlefish.

At least you'll be on a relatively modern kernel!

Regards,

Christian

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com