Re: Squeezing Performance of CEPH

Massimiliano Cuttini <max@xxxxxxxxxxxxx> · Fri, 23 Jun 2017 10:26:57 +0200



    Hi Ashley,
    I already know, I was already expecting that the bottleneck was
      the minimum between bandwidth and disks (and was currently disk on
      my first email).

      I thinking that write is still to low.
    I read that removing journal overhead is not a good idea.

      However I'm writing twice on a SSD... even this seems not a good
      idea.

      How is possible to remove this overhead?
    

    Il 22/06/2017 19:47, Ashley Merrick ha
      scritto:

    
      Hello,
      

      Also as Mark put, one minute your testing bandwidth capacity,
        next minute your testing disk capacity.
      

      No way is a small set of SSD’s going to be able to max your
        current bandwidth, even if you removed the CEPH / Journal
        overhead. I would say the speeds you are getting are what you
        should expect , see with many other setups.
      

      ,Ashley

        
        Sent from my iPhone
        

          On 23 Jun 2017, at 12:42 AM, Mark Nelson <mnelson@xxxxxxxxxx>
          wrote:

          
          Hello Massimiliano,

            
            Based on the configuration below, it appears you have
              8 SSDs total (2 nodes with 4 SSDs each)?

            
            I'm going to assume you have 3x replication and are
              you using filestore, so in reality you are writing 3
              copies and doing full data journaling for each copy, for
              6x writes per client write.  Taking this into account,
              your per-SSD throughput should be somewhere around:

            
            Sequential write:

            ~600 * 3 (copies) * 2 (journal write per copy) / 8
              (ssds) = ~450MB/s

            
            Sequential read

            ~3000 / 8 (ssds) = ~375MB/s

            
            Random read

            ~3337 / 8 (ssds) = ~417MB/s

            
            These numbers are pretty reasonable for SATA based
              SSDs, though the read throughput is a little low.  You
              didn't include the model of SSD, but if you look at
              Intel's DC S3700 which is a fairly popular SSD for ceph:

            
            https://www.intel.com/content/www/us/en/solid-state-drives/ssd-dc-s3700-spec.html

            
            Sequential read is up to ~500MB/s and Sequential write
              speeds up to 460MB/s.  Not too far off from what you are
              seeing.  You might try playing with readahead on the OSD
              devices to see if that improves things at all.  Still,
              unless I've missed something these numbers aren't
              terrible.

            
            Mark

            
            On 06/22/2017 12:19 PM, Massimiliano Cuttini wrote:

            Hi everybody,

            
            I want to squeeze all the
                performance of CEPH (we are using jewel 10.2.7).

            
            We are testing a testing
                environment with 2 nodes having the same

            
            configuration:

            
             * CentOS 7.3

            
             * 24 CPUs (12 for real in
                hyper threading)

            
             * 32Gb of RAM

            
             * 2x 100Gbit/s ethernet cards

            
             * 2x OS dedicated in raid SSD
                Disks

            
             * 4x OSD SSD Disks SATA
                6Gbit/s

            
            We are already expecting the
                following bottlenecks:

            
             * [ SATA speed x n° disks ] =
                24Gbit/s

            
             * [ Networks speed x n°
                bonded cards ] = 200Gbit/s

            
            So the minimum between them is
                24 Gbit/s per node (not taking in account

            
            protocol loss).

            
            24Gbit/s per node x2 =
                48Gbit/s of maximum hypotetical theorical gross

            
            speed.

            
            Here are the tests:

            
            ///////IPERF2/////// Tests are
                quite good scoring 88% of the bottleneck.

            
            Note: iperf2 can use only 1
                connection from a bond.(it's a well know issue).

            
               [ ID] Interval
                      Transfer     Bandwidth

            
               [ 12]  0.0-10.0 sec  9.55
                GBytes  8.21 Gbits/sec

            
               [  3]  0.0-10.0 sec  10.3
                GBytes  8.81 Gbits/sec

            
               [  5]  0.0-10.0 sec  9.54
                GBytes  8.19 Gbits/sec

            
               [  7]  0.0-10.0 sec  9.52
                GBytes  8.18 Gbits/sec

            
               [  6]  0.0-10.0 sec  9.96
                GBytes  8.56 Gbits/sec

            
               [  8]  0.0-10.0 sec  12.1
                GBytes  10.4 Gbits/sec

            
               [  9]  0.0-10.0 sec  12.3
                GBytes  10.6 Gbits/sec

            
               [ 10]  0.0-10.0 sec  10.2
                GBytes  8.80 Gbits/sec

            
               [ 11]  0.0-10.0 sec  9.34
                GBytes  8.02 Gbits/sec

            
               [  4]  0.0-10.0 sec  10.3
                GBytes  8.82 Gbits/sec

            
               [SUM]  0.0-10.0 sec   103
                GBytes  88.6 Gbits/sec

            
            ///////RADOS BENCH

            
            Take in consideration the
                maximum hypotetical speed of 48Gbit/s tests

            
            (due to disks bottleneck),
                tests are not good enought.

            
             * Average MB/s in write is
                almost 5-7Gbit/sec (12,5% of the mhs)

            
             * Average MB/s in seq read is
                almost 24Gbit/sec (50% of the mhs)

            
             * Average MB/s in random read
                is almost 27Gbit/se (56,25% of the mhs).

            
            Here are the reports.

            
            Write:

            
               # rados bench -p scbench 10
                write --no-cleanup

            
               Total time run:
                        10.229369

            
               Total writes made:
                     1538

            
               Write size:
                            4194304

            
               Object size:
                           4194304

            
               Bandwidth (MB/sec):
                    601.406

            
               Stddev Bandwidth:
                      357.012

            
               Max bandwidth (MB/sec):
                1080

            
               Min bandwidth (MB/sec): 204

            
               Average IOPS:           150

            
               Stddev IOPS:            89

            
               Max IOPS:               270

            
               Min IOPS:               51

            
               Average Latency(s):
                    0.106218

            
               Stddev Latency(s):
                     0.198735

            
               Max latency(s):
                        1.87401

            
               Min latency(s):
                        0.0225438

            
            sequential read:

            
               # rados bench -p scbench 10
                seq

            
               Total time run:
                      2.054359

            
               Total reads made:     1538

            
               Read size:
                           4194304

            
               Object size:
                         4194304

            
               Bandwidth (MB/sec):
                  2994.61

            
               Average IOPS          748

            
               Stddev IOPS:          67

            
               Max IOPS:             802

            
               Min IOPS:             707

            
               Average Latency(s):
                  0.0202177

            
               Max latency(s):
                      0.223319

            
               Min latency(s):
                      0.00589238

            
            random read:

            
               # rados bench -p scbench 10
                rand

            
               Total time run:
                      10.036816

            
               Total reads made:     8375

            
               Read size:
                           4194304

            
               Object size:
                         4194304

            
               Bandwidth (MB/sec):
                  3337.71

            
               Average IOPS:         834

            
               Stddev IOPS:          78

            
               Max IOPS:             927

            
               Min IOPS:             741

            
               Average Latency(s):
                  0.0182707

            
               Max latency(s):
                      0.257397

            
               Min latency(s):
                      0.00469212

            
            //------------------------------------

            
            It's seems like that there are
                some bottleneck somewhere that we are

            
            understimating.

            
            Can you help me to found it?

            
            _______________________________________________

            
            ceph-users mailing list

            
            ceph-users@xxxxxxxxxxxxxx

            
            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

            
            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          
      _______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

    
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com