[SSD NVM FOR JOURNAL] Performance issues

Guilherme Steinmüller <guilhermesteinmuller@xxxxxxxxx> · Wed, 23 Aug 2017 09:11:18 -0300

Hello! 

I recently installed INTEL SSD 400GB 750 SERIES PCIE 3.0 X4 in 3 of my OSD nodes. 

First of all, here's is an schema describing how my cluster is:

I
 primarily use my ceph as a beckend for OpenStack nova, glance, swift 
and cinder. My crushmap is configured to have rulesets for SAS disks, 
SATA disks and another ruleset that resides in HPE nodes using SATA 
disks too.

Before
 installing the new journal in HPE nodes, i was using one of the disks 
that today are OSDs (osd.35, osd.34 and osd.33). After upgrading the 
journal, i noticed that a dd command writing 1gb blocks in openstack 
nova instances doubled the throughput but the value expected was 
actually 400% or 500% since in the Dell nodes that we have another nova 
pool the throughput is around this value.

Here is a demonstration of the scenario and the difference in performance between Dell nodes and HPE nodes:

Scenario: 

   Using pools to store instance disks for OpenStack 
    Pool nova in "ruleset SAS" placed on c4-osd201, c4-osd202 and c4-osd203 with 5 osds per hosts
    Pool nova_hpedl180 in "ruleset NOVA_HPEDL180" placed on c4-osd204, c4-osd205, c4-osd206 with 3 osds per hosts
    Every OSD has one partition of 35GB in a INTEL SSD 400GB 750 SERIES PCIE 3.0 X4
    Internal link for cluster and public network of 10Gbps
    Deployment via ceph-ansible. Same configuration define in ansible for every host on cluster

Instance on pool nova in ruleset SAS:

   # dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
       1+0 records in
       1+0 records out
       1073741824 bytes (1.1 GB, 1.0 GiB) copied, 2.56255 s, 419 MB/s

Instance on pool nova in ruleset NOVA_HPEDL180:

     #  dd if=/dev/zero of=/mnt/bench bs=1G count=1 oflag=direct
     1+0 records in
     1+0 records out
     1073741824 bytes (1.1 GB, 1.0 GiB) copied, 11.8243 s, 90.8 MB/s

I made some FIO benchmarks as suggested by Sebastien ( https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your-ssd-is-suitable-as-a-journal-device/ ) and the command with 1 job returned me about 180MB/s of throughput in recently installed nodes (HPE nodes). I made some hdparm benchmark in all SSDs and everything seems normal.

I
 can't see what is causing this difference of throughput since the 
network is not a problem and i think that cpu and memory are not crucial
 since i was monitoring the cluster with atop command and i didn't 
notice saturation of resources. My only though is that I have less 
workload in nova_hpedl180 pool in HPE nodes and less disks per node and 
this ca influence in the throughput of the journal.

Any clue about what is missing or what is happening?

Thanks in advance.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com