Re: Observations with a SSD based pool under Hammer

Christian Balzer <chibi@xxxxxxx> · Fri, 26 Feb 2016 13:15:10 +0900



Hello,

On Wed, 24 Feb 2016 23:01:43 -0700 Robert LeBlanc wrote:

> With my S3500 drives in my test cluster, the latest master branch gave me
> an almost 2x increase in performance compare to just a month or two ago.
> There looks to be some really nice things coming in Jewel around SSD
> performance. My drives are now 80-85% busy doing about 10-12K IOPS when
> doing 4K fio to libRBD.
> 
That's good news, but then again the future is always bright. ^o^
Before that (or even now with the SSDs still 15% idle), were you
exhausting your CPUs or are they also still not fully utilized as I am
seeing below?

Christian

> Sent from a mobile device, please excuse any typos.
> On Feb 24, 2016 8:10 PM, "Christian Balzer" <chibi@xxxxxxx> wrote:
> 
> >
> > Hello,
> >
> > For posterity and of course to ask some questions, here are my
> > experiences with a pure SSD pool.
> >
> > SW: Debian Jessie, Ceph Hammer 0.94.5.
> >
> > HW:
> > 2 nodes (thus replication of 2) with each:
> > 2x E5-2623 CPUs
> > 64GB RAM
> > 4x DC S3610 800GB SSDs
> > Infiniband (IPoIB) network
> >
> > Ceph: no tuning or significant/relevant config changes, OSD FS is Ext4,
> > Ceph journal is inline (journal file).
> >
> > Performance:
> > A test run with "rados -p cache  bench 30 write -t 32" (4MB blocks)
> > gives me about 620MB/s, the storage nodes are I/O bound (all SSDs are
> > 100% busy according to atop) and this meshes nicely with the speeds I
> > saw when testing the individual SSDs with fio before involving Ceph.
> >
> > To elaborate on that, an individual SSD of that type can do about
> > 500MB/s sequential writes, so ideally you would see 1GB/s writes with
> > Ceph (500*8/2(replication)/2(journal on same disk).
> > However my experience tells me that other activities (FS journals,
> > leveldb PG updates, etc) impact things as well.
> >
> > A test run with "rados -p cache  bench 30 write -t 32 -b 4096" (4KB
> > blocks) gives me about 7200 IOPS, the SSDs are about 40% busy.
> > All OSD processes are using about 2 cores and the OS another 2, but
> > that leaves about 6 cores unused (MHz on all cores scales to max
> > during the test run).
> > Closer inspection with all CPUs being displayed in atop shows that no
> > single core is fully used, they all average around 40% and even the
> > busiest ones (handling IRQs) still have ample capacity available.
> > I'm wondering if this an indication of insufficient parallelism or if
> > it's latency of sorts.
> > I'm aware of the many tuning settings for SSD based OSDs, however I was
> > expecting to run into a CPU wall first and foremost.
> >
> >
> > Write amplification:
> > 10 second rados bench with 4MB blocks, 6348MB written in total.
> > nand-writes per SSD:118*32MB=3776MB.
> > 30208MB total written to all SSDs.
> > Amplification:4.75
> >
> > Very close to what you would expect with a replication of 2 and
> > journal on same disk.
> >
> >
> > 10 second rados bench with 4KB blocks, 219MB written in total.
> > nand-writes per SSD:41*32MB=1312MB.
> > 10496MB total written to all SSDs.
> > Amplification:48!!!
> >
> > Le ouch.
> > In my use case with rbd cache on all VMs I expect writes to be rather
> > large for the most part and not like this extreme example.
> > But as I wrote the last time I did this kind of testing, this is an
> > area where caveat emptor most definitely applies when planning and
> > buying SSDs. And where the Ceph code could probably do with some
> > attention.
> >
> > Regards,
> >
> > Christian
> > --
> > Christian Balzer        Network/Systems Engineer
> > chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
> > http://www.gol.com/
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >


-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com