Hello! On Wed, May 25, 2016 at 11:45:29AM +0900, chibi wrote: > Hello, > On Tue, 24 May 2016 21:20:49 +0300 Max A. Krasilnikov wrote: >> Hello! >> >> I have cluster with 5 SSD drives as OSD backed by SSD journals, one per >> osd. One osd per node. >> > More details will help identify other potential bottlenecks, such as: > CPU/RAM > Kernel, OS version. For now I have 3x(Openstack controller + ceph mon + 8xOSD (one for SSD)). All running Ubuntu 14.04+Hammer from ubuntu-cloud, now moving to Ubuntu 14.04+Ceph Jewel from Ceph site. E5-2620 v2 (12 cores) 32G RAM Linux 4.2.0, moving to 4.4 from Xenial. >> Data drives is Samsung 850 EVO 1TB, journals are Samsung 850 EVO 250G, >> journal partition is 24GB, data partition is 790GB. OSD nodes connected >> by 2x10Gbps linux bonding for data/cluster network. >> > As Oliver wrote, these SSDs are totally unsuited for usage with Ceph, > especially regarding to journals. > But also in general, since they're neither handling IOPS in a consistent, > predictable manner. > And they're not durable (endurance, TBW) enough either. Yep, I understand. But on second cluster w/ ScaleIO they do much better :( > When using SSDs or NVMes, use DC level ones exclusively, Intel is the more > tested one in these parts, but the Samsung DC level ones ought to be fine, > too. I can hope, my employer will provide me with them, but for now i have to do all the best with current hardware :( > >> When doing random write with 4k blocks with direct=1, buffered=0, >> iodepth=32..1024, ioengine=libaio from nova qemu virthost I can get no >> more than 9kiops. Randread is about 13-15 kiops. >> >> Trouble is that randwrite not depends on iodepth. read, write can be up >> to 140kiops, randread up to 15 kiops. randwrite is always 2-9 kiops. >> > Aside from the limitations of your SSDs, there are other factors, like CPU > utilization. > And very importantly also network latency, but that's for single threaded > IOPS mostly. >> Ceph cluster is mixed of jewel and hammer, upgrading now to jewel. On >> Hammer I got the same results. >> > Mixed is a very bad state for a cluster to be in. > Jewel has lots of improvements in that area, but w/o decent hardware you > may not see them. My cluster is upgrading now. 2 OSD per night :), one node per week, with changing old 850EVO to new ones. >> All journals can do up to 32kiops with the same config for fio. >> >> I am confused because EMC ScaleIO can do much more iops what is boring >> my boss :) >> > There are lot of discussion and slides on how to improve/maximize IOPS > with Ceph, go search for them. > Fast CPUs, jmalloc, pinning, configuration, NVMes for journals, etc. I have seen a lot of them. Will try to use pinning, I have never used it before. > Christian > -- > Christian Balzer Network/Systems Engineer > chibi@xxxxxxx Global OnLine Japan/Rakuten Communications > http://www.gol.com/ -- WBR, Max A. Krasilnikov _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com