Hello, On Wed, 24 Feb 2016 23:01:43 -0700 Robert LeBlanc wrote: > With my S3500 drives in my test cluster, the latest master branch gave me > an almost 2x increase in performance compare to just a month or two ago. > There looks to be some really nice things coming in Jewel around SSD > performance. My drives are now 80-85% busy doing about 10-12K IOPS when > doing 4K fio to libRBD. > That's good news, but then again the future is always bright. ^o^ Before that (or even now with the SSDs still 15% idle), were you exhausting your CPUs or are they also still not fully utilized as I am seeing below? Christian > Sent from a mobile device, please excuse any typos. > On Feb 24, 2016 8:10 PM, "Christian Balzer" <chibi@xxxxxxx> wrote: > > > > > Hello, > > > > For posterity and of course to ask some questions, here are my > > experiences with a pure SSD pool. > > > > SW: Debian Jessie, Ceph Hammer 0.94.5. > > > > HW: > > 2 nodes (thus replication of 2) with each: > > 2x E5-2623 CPUs > > 64GB RAM > > 4x DC S3610 800GB SSDs > > Infiniband (IPoIB) network > > > > Ceph: no tuning or significant/relevant config changes, OSD FS is Ext4, > > Ceph journal is inline (journal file). > > > > Performance: > > A test run with "rados -p cache bench 30 write -t 32" (4MB blocks) > > gives me about 620MB/s, the storage nodes are I/O bound (all SSDs are > > 100% busy according to atop) and this meshes nicely with the speeds I > > saw when testing the individual SSDs with fio before involving Ceph. > > > > To elaborate on that, an individual SSD of that type can do about > > 500MB/s sequential writes, so ideally you would see 1GB/s writes with > > Ceph (500*8/2(replication)/2(journal on same disk). > > However my experience tells me that other activities (FS journals, > > leveldb PG updates, etc) impact things as well. > > > > A test run with "rados -p cache bench 30 write -t 32 -b 4096" (4KB > > blocks) gives me about 7200 IOPS, the SSDs are about 40% busy. > > All OSD processes are using about 2 cores and the OS another 2, but > > that leaves about 6 cores unused (MHz on all cores scales to max > > during the test run). > > Closer inspection with all CPUs being displayed in atop shows that no > > single core is fully used, they all average around 40% and even the > > busiest ones (handling IRQs) still have ample capacity available. > > I'm wondering if this an indication of insufficient parallelism or if > > it's latency of sorts. > > I'm aware of the many tuning settings for SSD based OSDs, however I was > > expecting to run into a CPU wall first and foremost. > > > > > > Write amplification: > > 10 second rados bench with 4MB blocks, 6348MB written in total. > > nand-writes per SSD:118*32MB=3776MB. > > 30208MB total written to all SSDs. > > Amplification:4.75 > > > > Very close to what you would expect with a replication of 2 and > > journal on same disk. > > > > > > 10 second rados bench with 4KB blocks, 219MB written in total. > > nand-writes per SSD:41*32MB=1312MB. > > 10496MB total written to all SSDs. > > Amplification:48!!! > > > > Le ouch. > > In my use case with rbd cache on all VMs I expect writes to be rather > > large for the most part and not like this extreme example. > > But as I wrote the last time I did this kind of testing, this is an > > area where caveat emptor most definitely applies when planning and > > buying SSDs. And where the Ceph code could probably do with some > > attention. > > > > Regards, > > > > Christian > > -- > > Christian Balzer Network/Systems Engineer > > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > > http://www.gol.com/ > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com