Re: SSD randwrite performance

Christian Balzer <chibi@xxxxxxx> · Wed, 25 May 2016 11:45:29 +0900

Hello,

On Tue, 24 May 2016 21:20:49 +0300 Max A. Krasilnikov wrote:

> Hello!
> 
> I have cluster with 5 SSD drives as OSD backed by SSD journals, one per
> osd. One osd per node.
> 
More details will help identify other potential bottlenecks, such as:
CPU/RAM
Kernel, OS version.

> Data drives is Samsung 850 EVO 1TB, journals are Samsung 850 EVO 250G,
> journal partition is 24GB, data partition is 790GB. OSD nodes connected
> by 2x10Gbps linux bonding for data/cluster network.
>
As Oliver wrote, these SSDs are totally unsuited for usage with Ceph,
especially regarding to journals. 
But also in general, since they're neither handling IOPS in a consistent,
predictable manner.
And they're not durable (endurance, TBW) enough either.

When using SSDs or NVMes, use DC level ones exclusively, Intel is the more
tested one in these parts, but the Samsung DC level ones ought to be fine,
too.

> When doing random write with 4k blocks with direct=1, buffered=0,
> iodepth=32..1024, ioengine=libaio from nova qemu virthost I can get no
> more than 9kiops. Randread is about 13-15 kiops.
> 
> Trouble is that randwrite not depends on iodepth. read, write can be up
> to 140kiops, randread up to 15 kiops. randwrite is always 2-9 kiops.
> 
Aside from the limitations of your SSDs, there are other factors, like CPU
utilization.
And very importantly also network latency, but that's for single threaded
IOPS mostly.

> Ceph cluster is mixed of jewel and hammer, upgrading now to jewel. On
> Hammer I got the same results.
> 
Mixed is a very bad state for a cluster to be in.

Jewel has lots of improvements in that area, but w/o decent hardware you
may not see them.

> All journals can do up to 32kiops with the same config for fio.
> 
> I am confused because EMC ScaleIO can do much more iops what is boring
> my boss :)
> 
There are lot of discussion and slides on how to improve/maximize IOPS
with Ceph, go search for them.

Fast CPUs, jmalloc, pinning, configuration, NVMes for journals, etc.

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com