rados bench -p rbd 60 write -b 4M -t 1

"wr@xxxxxxxx" <wr@xxxxxxxx> · Thu, 21 Jul 2016 14:50:38 +0200



    What i not really undertand is:
    Lets say the Intel P3700 works with 200 MByte/s rados bench one
      thread... See Nicks results below...

    
    If we have multiple OSD Nodes. For example 10 Nodes.
    Every Node has exactly 1x P3700 NVMe built in.
    Why is the single Thread performance exactly at 200 MByte/s on
      the rbd client with 10 OSD Node Cluster???
    I think it must be at 10 Nodes * 200 MByte/s = 2000 MByte/s.
    

    Everyone look yourself at your cluster. 

    
    dstat -D sdb,sdc,sdd,sdX ....
    You will see that Ceph stripes the data over all OSD's in the
      cluster if you test at the client side with rados bench...
    rados bench -p rbd 60 write -b 4M -t 1
    

    Am 21.07.16 um 14:38 schrieb
      wr@xxxxxxxx:

    
    Is there not a way to enable Linux page Cache? So do
      not user D_Sync...
      

      Then we would the dramatically performance improve.
      

      Am 21.07.16 um 14:33 schrieb Nick Fisk:
      

        -----Original Message-----
          

          From: wr@xxxxxxxx [mailto:wr@xxxxxxxx]
          

          Sent: 21 July 2016 13:23
          

          To: nick@xxxxxxxxxx; 'Horace Ng' <horace@xxxxxxxxx>
          

          Cc: ceph-users@xxxxxxxxxxxxxx
          

          Subject: Re:  Ceph + VMware + Single Thread
          Performance
          

          Okay and what is your plan now to speed up ?
          

        Now I have come up with a lower latency hardware design, there
        is not much further improvement until persistent RBD caching is
        implemented, as you will be moving the SSD/NVME closer to the
        client. But I'm happy with what I can achieve at the moment. You
        could also experiment with bcache on the RBD.
        

        Would it help to put in multiple P3700
          per OSD Node to improve performance for a single Thread
          (example Storage VMotion) ?
          

        Most likely not, it's all the other parts of the puzzle which
        are causing the latency. ESXi was designed for storage arrays
        that service IO's in 100us-1ms range, Ceph is probably about 10x
        slower than this, hence the problem. Disable the BBWC on a RAID
        controller or SAN and you will the same behaviour.
        

        Regards
          

          Am 21.07.16 um 14:17 schrieb Nick Fisk:
          

            -----Original Message-----
              

              From: ceph-users
              [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf
              

              Of wr@xxxxxxxx
              

              Sent: 21 July 2016 13:04
              

              To: nick@xxxxxxxxxx; 'Horace Ng' <horace@xxxxxxxxx>
              

              Cc: ceph-users@xxxxxxxxxxxxxx
              

              Subject: Re:  Ceph + VMware + Single Thread
              Performance
              

              Hi,
              

              hmm i think 200 MByte/s is really bad. Is your Cluster in
              production right now?
              

            It's just been built, not running yet.
            

            So if you start a storage migration
              you get only 200 MByte/s right?
              

            I wish. My current cluster (not this new one) would storage
            migrate at
            

            ~10-15MB/s. Serial latency is the problem, without being
            able to
            

            buffer, ESXi waits on an ack for each IO before sending the
            next. Also it submits the migrations in 64kb chunks, unless
            you get VAAI
            

          working. I think esxi will try and do them in parallel, which
          will help as well.
          

            I think it would be awesome if you
              get 1000 MByte/s
              

              Where is the Bottleneck?
              

            Latency serialisation, without a buffer, you can't drive the
            devices
            

            to 100%. With buffered IO (or high queue depths) I can max
            out the journals.
            

            A FIO Test from Sebastien Han give
              us 400 MByte/s raw performance from the P3700.
              

https://www.sebastien-han.fr/blog/2014/10/10/ceph-how-to-test-if-your
              

              -ssd-is-suitable-as-a-journal-device/
              

              How could it be that the rbd client performance is 50%
              slower?
              

              Regards
              

              Am 21.07.16 um 12:15 schrieb Nick Fisk:
              

              I've had a lot of pain with this,
                smaller block sizes are even worse.
                

                You want to try and minimize latency at every point as
                there is no
                

                buffering happening in the iSCSI stack. This means:-
                

                1. Fast journals (NVME or NVRAM)
                

                2. 10GB or better networking
                

                3. Fast CPU's (Ghz)
                

                4. Fix CPU c-state's to C1
                

                5. Fix CPU's Freq to max
                

                Also I can't be sure, but I think there is a metadata
                update
                

                happening with VMFS, particularly if you are using thin
                VMDK's, this
                

                can also be a major bottleneck. For my use case, I've
                switched over to NFS as it has given much more
                performance at scale and
                

          less headache.
          

              For the RADOS Run, here you go
                (400GB P3700):
                

                Total time run:         60.026491
                

                Total writes made:      3104
                

                Write size:             4194304
                

                Object size:            4194304
                

                Bandwidth (MB/sec):     206.842
                

                Stddev Bandwidth:       8.10412
                

                Max bandwidth (MB/sec): 224
                

                Min bandwidth (MB/sec): 180
                

                Average IOPS:           51
                

                Stddev IOPS:            2
                

                Max IOPS:               56
                

                Min IOPS:               45
                

                Average Latency(s):     0.0193366
                

                Stddev Latency(s):      0.00148039
                

                Max latency(s):         0.0377946
                

                Min latency(s):         0.015909
                

                Nick
                

                -----Original Message-----
                  

                  From: ceph-users
                  [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On
                  

                  Behalf Of Horace
                  

                  Sent: 21 July 2016 10:26
                  

                  To: wr@xxxxxxxx
                  

                  Cc: ceph-users@xxxxxxxxxxxxxx
                  

                  Subject: Re:  Ceph + VMware + Single
                  Thread Performance
                  

                  Hi,
                  

                  Same here, I've read some blog saying that vmware will
                  frequently
                  

                  verify the locking on VMFS over iSCSI, hence it will
                  have much slower performance than NFS (with different
                  locking mechanism).
                  

                  Regards,
                  

                  Horace Ng
                  

                  ----- Original Message -----
                  

                  From: wr@xxxxxxxx
                  

                  To: ceph-users@xxxxxxxxxxxxxx
                  

                  Sent: Thursday, July 21, 2016 5:11:21 PM
                  

                  Subject:  Ceph + VMware + Single Thread
                  Performance
                  

                  Hi everyone,
                  

                  we see at our cluster relatively slow Single Thread
                  Performance on the iscsi Nodes.
                  

                  Our setup:
                  

                  3 Racks:
                  

                  18x Data Nodes, 3 Mon Nodes, 3 iscsi Gateway Nodes
                  with tgt (rbd cache off).
                  

                  2x Samsung SM863 Enterprise SSD for Journal (3 OSD per
                  SSD) and 6x
                  

                  WD Red 1TB per Data Node as OSD.
                  

                  Replication = 3
                  

                  chooseleaf = 3 type Rack in the crush map
                  

                  We get only ca. 90 MByte/s on the iscsi Gateway
                  Servers with:
                  

                  rados bench -p rbd 60 write -b 4M -t 1
                  

                  If we test with:
                  

                  rados bench -p rbd 60 write -b 4M -t 32
                  

                  we get ca. 600 - 700 MByte/s
                  

                  We plan to replace the Samsung SSD with Intel DC P3700
                  PCIe NVM'e
                  

                  for the Journal to get better Single Thread
                  Performance.
                  

                  Is anyone of you out there who has an Intel P3700 for
                  Journal an
                  

                  can give me back test results with:
                  

                  rados bench -p rbd 60 write -b 4M -t 1
                  

                  Thank you very much !!
                  

                  Kind Regards !!
                  

                  _______________________________________________
                  

                  ceph-users mailing list
                  

                  ceph-users@xxxxxxxxxxxxxx
                  

                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                  

                  _______________________________________________
                  

                  ceph-users mailing list
                  

                  ceph-users@xxxxxxxxxxxxxx
                  

                  http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
                  

              _______________________________________________
              

              ceph-users mailing list
              

              ceph-users@xxxxxxxxxxxxxx
              

              http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
              

      _______________________________________________
      

      ceph-users mailing list
      

      ceph-users@xxxxxxxxxxxxxx
      

      http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
      

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: Ceph + VMware + Single Thread Performance