Hello, Re-cap of my new test and staging cluster: 4 nodes running latest Hammer under Debian Jessie (with sysvinit, kernel 4.6) and manually created OSDs. Infiniband (IPoIB) QDR (40Gb/s, about 30Gb/s effective) between all nodes. 2 HDD OSD nodes with 32GB RAM, fast enough CPU (E5-2620 v3), 2x 200GB DC S3610 for OS and journals (2 per SSD), 4x 1GB 2.5" SATAs for OSDs. For my amusement and edification the OSDs of one node are formatted with XFS, the other one EXT4 (as all my production clusters). The 2 SSD ODS nodes have 1x 200GB DC S3610 (OS and 4 journal partitions) and 2x 400GB DC S3610s (2 180GB partitions, so 8 SSD OSDs total), same specs as the HDD nodes otherwise. Also one node with XFS, the other EXT4. Today I added the above 2 SSD nodes and created a pool (future cache tier) on them. First I did some 4M block (default) rados bench runs, with the following layout: 200GB SSD with 2 journals, 80+% utilization (200MB/s), 400GB SSD, 2 OSDs, external journals (above) 50% (200MB/s) 400GB SSD, 2 OSDs co-located journals in FS 100% (400MB/s) Thus unsurprisingly 400MB/s max speed for the cluster in this config. With one journal per 400GB SSD external and one internal, I got 100% usage on the 200GB journal SSD (230MB/s) and about 90% on the OSD SSDs, resulting in 430MB/s throughput. In a production environment however I'd go again with in-line journals, as matching up both speed and endurance for SSD journals basically means big NVMes with the according price tag. XFS and EXT4 didn't show any significant differences, around 400% IOwait. The most interesting result was the 4K rados bench on the SSD pool: The XFS node had a 1/3 lower wait (I/O) at 40%, but also a significant lower average throughput per SSD of 120MB/s. The EXT4 node registered around 120% IOwait, but wrote 160MB/s per SSD on average. I'm not sure how to quantify these numbers, though there is another interesting tidbit below in the HDD tests. Am I seeing more EXT4 overhead, or is it actually writing more? FIO runs on the actual FS give a slight (2%) advantage to EXT4, but nothing like what I'm seeing here. Idle CPU during those 4K runs is down to 100% (out of 1200) at times, which matches the 4 OSD processes per node running at about 250%. Lastly the HDD OSD, 4MB block bench (nothing outstanding with the 4K one): Similar throughput per drive, however according to atop the avio per HDD is 12ms with XFS and 8ms with EXT4. Some food for thought, minor though with BlueStore in the pipeline. Christian -- Christian Balzer Network/Systems Engineer chibi@xxxxxxx Global OnLine Japan/Rakuten Communications http://www.gol.com/ _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com