On 12/16/2013 02:42 AM, Christian Balzer wrote:
Hello,
Hi Christian!
new to Ceph, not new to replicated storage. Simple test cluster with 2 identical nodes running Debian Jessie, thus ceph 0.48. And yes, I very much prefer a distro supported package.
I know you'd like to use the distro package, but 0.48 is positively ancient at this point. There's been a *lot* of fixes/changes since then. If it makes you feel better, our current professionally supported release is based on dumpling.
Single mon and osd1 on node a, osd2 on node b. 1GbE direct interlink between the nodes, used exclusively for this setup. Bog standard, minimum configuration, declaring a journal but that's on the same backing storage. The backing storage can do this locally (bonnie++): Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP irt03 8G 89267 21 60474 15 267049 37 536.9 12 Latency 4792ms 245ms 44908us 113ms And this with the a 20GB rbd (formatted the same way, ext4, as the test above) mounted on the node that hosts the mon and osd1: Version 1.97 ------Sequential Output------ --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP irt03 8G 11525 2 5562 1 48221 6 167.3 3 Latency 5073ms 2912ms 321ms 2841ms I'm looking at Ceph/RBD to store VM volumes with ganeti and these numbers frankly scare me. Watching the traffic with ethstats I never saw something higher than this during writes (on node a): eth2: 72.32 Mb/s In 127.99 Mb/s Out - 8035.4 p/s In 11649.5 p/s Out I assume the traffic coming back in is replica stuff from node b, right? What prevented it to use more than about 13% of the network link capacity? Aside from that cringeworthy drop to 15% of the backing storage speed (and network link) which I presume might be salvageable by using a SSD journal I'm more than puzzled by the read speed. For starters I would have assumed that in this 2 replica setup all data is present on the local node a and Ceph would be smart enough to get it all locally. But even if it was talking to both nodes a and b (or just b) I would have expected something in the 100MB/s range.
Ceph typically always reads data from the primary OSD, so wherever the primary is located, that's where it will read from. The good news is that this gives you a better probability of spreading yours reads out over the whole cluster. The bad news is that you have more network traffic to deal with.
Any insights would be much appreciated.
With 0.48 it's kind of tough to make any recommendations because I frankly don't remember exactly everything that's changed since then. You'll probably want to make sure that syncfs is being used, and you probably will want to play around with enabling/disabling the filestore flusher and maybe turning journal aio on. Looks like RBD cache was included in 0.46, so you can try enabling that, but it had performance issues with sequential writes before cuttlefish.
At least you'll be on a relatively modern kernel!
Regards, Christian
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com