Hello! On Thu, May 26, 2016 at 04:01:27PM +0900, chibi wrote: > >>> I have cluster with 5 SSD drives as OSD backed by SSD journals, one > >>> per osd. One osd per node. > >>> > >> More details will help identify other potential bottlenecks, such as: > >> CPU/RAM > >> Kernel, OS version. >> >> For now I have 3x(Openstack controller + ceph mon + 8xOSD (one for >> SSD)). All running Ubuntu 14.04+Hammer from ubuntu-cloud, now moving to >> Ubuntu 14.04+Ceph Jewel from Ceph site. >> E5-2620 v2 (12 cores) > With SSDs faster cores are definitely better, but as said, that's not your > main problem probably. > Setting the governor to "performance" helps with latency. root@storage001:~# cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor performance For all processors. > Again, have you run atop on your OSD nodes while doing those tests? > Are the SSDs very busy (near/at 100%) or is it the CPUs? Yes, sometimes over 100% (101 is not a rare thing when backfilling). But only 60-100 MBps at the same time, less than 1000 writes per sec. >> 32G RAM > So this is just one OSD per node, right? Should be enough then. I would be glad to say "yes", but no. 32G RAM for 8 OSD per node + Openstack controller + Openstack network node here. 3 of these OSDs is 6TB HDD, 1 is 1TB SSD, 4x 2TB HDD. Journals is a partitions on 2 SSDs. First partitions of these SSDs are coupled to linux mdraid leve1 for system. >> Linux 4.2.0, moving to 4.4 from Xenial. >> > >>> Data drives is Samsung 850 EVO 1TB, journals are Samsung 850 EVO 250G, > >>> journal partition is 24GB, data partition is 790GB. OSD nodes > >>> connected by 2x10Gbps linux bonding for data/cluster network. > >>> > >> As Oliver wrote, these SSDs are totally unsuited for usage with Ceph, > >> especially regarding to journals. > >> But also in general, since they're neither handling IOPS in a > >> consistent, predictable manner. > >> And they're not durable (endurance, TBW) enough either. >> >> Yep, I understand. But on second cluster w/ ScaleIO they do much >> better :( >> > Well, if one believes the hype (mostly from EMC though) about ScaleIO it's > n times better than Ceph and even better than sliced bread. </sarcasm> > But even if ScaleIO code/design/architecture is so much better than Ceph, > these SSDs are still not something you ever want to use in a production > environment, they have unpredictable performance (and degradation > potentially) and most of all will wear out quickly. > Also there have been reports here with EVOs dying long long before they > were supposed according to their wear-out levels. > Lastly, no matter what ScaleIO does, at some point it better do a SYNC > write to its "disks" to have a safe checkpoint, so that performance part > of these SSDs comes to bear as well. As I understand, sio does not sync unbuffered writes. But my employer belives in miracles. >> My cluster is upgrading now. 2 OSD per night :), one node per week, with >> changing old 850EVO to new ones. >> > Test again with a full Jewel cluster. I will. And I will report. -- WBR, Max A. Krasilnikov _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com