Re: rados block on SSD - performance - how to tune and get insight?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/7/19 8:41 AM, Brett Chancellor wrote:
> This seems right. You are doing a single benchmark from a single client.
> Your limiting factor will be the network latency. For most networks this
> is between 0.2 and 0.3ms.  if you're trying to test the potential of
> your cluster, you'll need multiple workers and clients.
> 

Indeed. To add to this, you will need fast (High clockspeed!) CPUs in
order to get the latency down. The CPUs will need tuning as well like
their power profiles and C-States.

You won't get the 1:1 performance from the SSDs on your RBD block devices.

Wido

> On Thu, Feb 7, 2019, 2:17 AM <jesper@xxxxxxxx wrote:
> 
>     Hi List
> 
>     We are in the process of moving to the next usecase for our ceph cluster
>     (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and
>     that works fine.
> 
>     We're currently on luminous / bluestore, if upgrading is deemed to
>     change what we're seeing then please let us know.
> 
>     We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each.
>     Connected
>     through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and scheduler set to
>     deadline, nomerges = 1, rotational = 0.
> 
>     Each disk "should" give approximately 36K IOPS random write and the
>     double
>     random read.
> 
>     Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of
>     well performing SSD block devices - potentially to host databases and
>     things like that. I ready through this nice document [0], I know the
>     HW are radically different from mine, but I still think I'm in the
>     very low end of what 6 x S4510 should be capable of doing.
> 
>     Since it is IOPS i care about I have lowered block size to 4096 -- 4M
>     blocksize nicely saturates the NIC's in both directions.
> 
> 
>     $ sudo rados bench -p scbench -b 4096 10 write --no-cleanup
>     hints = 1
>     Maintaining 16 concurrent writes of 4096 bytes to objects of size
>     4096 for
>     up to 10 seconds or 0 objects
>     Object prefix: benchmark_data_torsk2_11207
>       sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) 
>     avg lat(s)
>         0       0         0         0         0         0           -   
>            0
>         1      16      5857      5841   22.8155   22.8164  0.00238437 
>     0.00273434
>         2      15     11768     11753   22.9533   23.0938   0.0028559 
>     0.00271944
>         3      16     17264     17248   22.4564   21.4648  0.00246666 
>     0.00278101
>         4      16     22857     22841   22.3037   21.8477    0.002716 
>     0.00280023
>         5      16     28462     28446   22.2213   21.8945  0.00220186   
>     0.002811
>         6      16     34216     34200   22.2635   22.4766  0.00234315 
>     0.00280552
>         7      16     39616     39600   22.0962   21.0938  0.00290661 
>     0.00282718
>         8      16     45510     45494   22.2118   23.0234   0.0033541 
>     0.00281253
>         9      16     50995     50979   22.1243   21.4258  0.00267282 
>     0.00282371
>        10      16     56745     56729   22.1577   22.4609  0.00252583 
>      0.0028193
>     Total time run:         10.002668
>     Total writes made:      56745
>     Write size:             4096
>     Object size:            4096
>     Bandwidth (MB/sec):     22.1601
>     Stddev Bandwidth:       0.712297
>     Max bandwidth (MB/sec): 23.0938
>     Min bandwidth (MB/sec): 21.0938
>     Average IOPS:           5672
>     Stddev IOPS:            182
>     Max IOPS:               5912
>     Min IOPS:               5400
>     Average Latency(s):     0.00281953
>     Stddev Latency(s):      0.00190771
>     Max latency(s):         0.0834767
>     Min latency(s):         0.00120945
> 
>     Min latency is fine -- but Max latency of 83ms ?
>     Average IOPS @ 5672 ?
> 
>     $ sudo rados bench -p scbench  10 rand
>     hints = 1
>       sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s) 
>     avg lat(s)
>         0       0         0         0         0         0           -   
>            0
>         1      15     23329     23314   91.0537   91.0703 0.000349856
>     0.000679074
>         2      16     48555     48539   94.7884   98.5352 0.000499159
>     0.000652067
>         3      16     76193     76177   99.1747   107.961 0.000443877
>     0.000622775
>         4      15    103923    103908   101.459   108.324 0.000678589
>     0.000609182
>         5      15    132720    132705   103.663   112.488 0.000741734
>     0.000595998
>         6      15    161811    161796   105.323   113.637 0.000333166
>     0.000586323
>         7      15    190196    190181   106.115   110.879 0.000612227
>     0.000582014
>         8      15    221155    221140   107.966   120.934 0.000471219
>     0.000571944
>         9      16    251143    251127   108.984   117.137 0.000267528
>     0.000566659
>     Total time run:       10.000640
>     Total reads made:     282097
>     Read size:            4096
>     Object size:          4096
>     Bandwidth (MB/sec):   110.187
>     Average IOPS:         28207
>     Stddev IOPS:          2357
>     Max IOPS:             30959
>     Min IOPS:             23314
>     Average Latency(s):   0.000560402
>     Max latency(s):       0.109804
>     Min latency(s):       0.000212671
> 
>     This is also quite far from expected. I have 12GB of memory on the OSD
>     daemon for caching on each host - close to idle cluster - thus 50GB+ for
>     caching with a working set of < 6GB .. this should - in this case
>     not really be bound by the underlying SSD. But if it were:
> 
>     IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off?
> 
>     No measureable service time in iostat when running tests, thus I have
>     come to the conclusion that it has to be either client side, the
>     network path, or the OSD-daemon that deliveres the increasing latency /
>     decreased IOPS.
> 
>     Is there any suggestions on how to get more insigths in that?
> 
>     Has anyone replicated close to the number Micron are reporting on NVMe?
> 
>     Thanks a log.
> 
>     [0]
>     https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en
> 
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> 
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux