Re: rados block on SSD - performance - how to tune and get insight?

"Marc Roos" <M.Roos@xxxxxxxxxxxxxxxxx> · Thu, 7 Feb 2019 10:37:08 +0100

I did your rados bench test on our sm863a pool 3x rep, got similar 
results.

[@]# rados bench -p fs_data.ssd -b 4096 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 
for up to 10 seconds or 0 objects
Object prefix: benchmark_data_c04_1337712
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)
    0       0         0         0         0         0           -        
   0
    1      16      6302      6286   24.5533   24.5547  0.00304773    
0.002541
    2      15     12545     12530   24.4705   24.3906  0.00228294   
0.0025506
    3      16     18675     18659   24.2933   23.9414  0.00332918  
0.00257042
    4      16     25194     25178   24.5854   25.4648   0.0034176  
0.00254016
    5      16     31657     31641   24.7169   25.2461  0.00156494  
0.00252686
    6      16     37713     37697   24.5398   23.6562  0.00228134  
0.00254527
    7      16     43848     43832   24.4572   23.9648  0.00238393  
0.00255401
    8      16     49516     49500   24.1673   22.1406  0.00244473  
0.00258466
    9      16     55562     55546   24.1059   23.6172  0.00249619  
0.00259139
   10      16     61675     61659   24.0829   23.8789   0.0020192  
0.00259362
Total time run:         10.002179
Total writes made:      61675
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     24.0865
Stddev Bandwidth:       0.932554
Max bandwidth (MB/sec): 25.4648
Min bandwidth (MB/sec): 22.1406
Average IOPS:           6166
Stddev IOPS:            238
Max IOPS:               6519
Min IOPS:               5668
Average Latency(s):     0.00259383
Stddev Latency(s):      0.00173856
Max latency(s):         0.0778051
Min latency(s):         0.00110931

[@ ]# rados bench -p fs_data.ssd  10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)
    0       0         0         0         0         0           -        
   0
    1      15     27697     27682   108.115   108.133 0.000755936 
0.000568212
    2      15     57975     57960   113.186   118.273 0.000547682 
0.000542773
    3      15     88500     88485   115.199   119.238  0.00036749 
0.000533185
    4      15    117199    117184   114.422   112.105 0.000354388 
0.000536647
    5      15    147734    147719    115.39   119.277 0.000419781  
0.00053221
    6      16    176393    176377   114.814   111.945 0.000427109 
0.000534771
    7      15    203693    203678   113.645   106.645 0.000379089 
0.000540113
    8      15    231917    231902   113.219    110.25 0.000465232 
0.000542156
    9      16    261054    261038   113.284   113.812 0.000358025 
0.000541972
Total time run:       10.000669
Total reads made:     290371
Read size:            4096
Object size:          4096
Bandwidth (MB/sec):   113.419
Average IOPS:         29035
Stddev IOPS:          1212
Max IOPS:             30535
Min IOPS:             27301
Average Latency(s):   0.000541371
Max latency(s):       0.00380609
Min latency(s):       0.000155521

-----Original Message-----
From: jesper@xxxxxxxx [mailto:jesper@xxxxxxxx] 
Sent: 07 February 2019 08:17
To: ceph-users@xxxxxxxxxxxxxx
Subject:  rados block on SSD - performance - how to tune and 
get insight?

Hi List

We are in the process of moving to the next usecase for our ceph cluster 
(Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and 
that works fine.

We're currently on luminous / bluestore, if upgrading is deemed to 
change what we're seeing then please let us know.

We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each. 
Connected through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and 
scheduler set to deadline, nomerges = 1, rotational = 0.

Each disk "should" give approximately 36K IOPS random write and the 
double random read.

Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of 
well performing SSD block devices - potentially to host databases and 
things like that. I ready through this nice document [0], I know the HW 
are radically different from mine, but I still think I'm in the very low 
end of what 6 x S4510 should be capable of doing.

Since it is IOPS i care about I have lowered block size to 4096 -- 4M 
blocksize nicely saturates the NIC's in both directions.

$ sudo rados bench -p scbench -b 4096 10 write --no-cleanup hints = 1 
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 
for up to 10 seconds or 0 objects Object prefix: 
benchmark_data_torsk2_11207
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)
    0       0         0         0         0         0           -        
   0
    1      16      5857      5841   22.8155   22.8164  0.00238437  
0.00273434
    2      15     11768     11753   22.9533   23.0938   0.0028559  
0.00271944
    3      16     17264     17248   22.4564   21.4648  0.00246666  
0.00278101
    4      16     22857     22841   22.3037   21.8477    0.002716  
0.00280023
    5      16     28462     28446   22.2213   21.8945  0.00220186    
0.002811
    6      16     34216     34200   22.2635   22.4766  0.00234315  
0.00280552
    7      16     39616     39600   22.0962   21.0938  0.00290661  
0.00282718
    8      16     45510     45494   22.2118   23.0234   0.0033541  
0.00281253
    9      16     50995     50979   22.1243   21.4258  0.00267282  
0.00282371
   10      16     56745     56729   22.1577   22.4609  0.00252583   
0.0028193
Total time run:         10.002668
Total writes made:      56745
Write size:             4096
Object size:            4096
Bandwidth (MB/sec):     22.1601
Stddev Bandwidth:       0.712297
Max bandwidth (MB/sec): 23.0938
Min bandwidth (MB/sec): 21.0938
Average IOPS:           5672
Stddev IOPS:            182
Max IOPS:               5912
Min IOPS:               5400
Average Latency(s):     0.00281953
Stddev Latency(s):      0.00190771
Max latency(s):         0.0834767
Min latency(s):         0.00120945

Min latency is fine -- but Max latency of 83ms ?
Average IOPS @ 5672 ?

$ sudo rados bench -p scbench  10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg 
lat(s)
    0       0         0         0         0         0           -        
   0
    1      15     23329     23314   91.0537   91.0703 0.000349856 
0.000679074
    2      16     48555     48539   94.7884   98.5352 0.000499159 
0.000652067
    3      16     76193     76177   99.1747   107.961 0.000443877 
0.000622775
    4      15    103923    103908   101.459   108.324 0.000678589 
0.000609182
    5      15    132720    132705   103.663   112.488 0.000741734 
0.000595998
    6      15    161811    161796   105.323   113.637 0.000333166 
0.000586323
    7      15    190196    190181   106.115   110.879 0.000612227 
0.000582014
    8      15    221155    221140   107.966   120.934 0.000471219 
0.000571944
    9      16    251143    251127   108.984   117.137 0.000267528 
0.000566659
Total time run:       10.000640
Total reads made:     282097
Read size:            4096
Object size:          4096
Bandwidth (MB/sec):   110.187
Average IOPS:         28207
Stddev IOPS:          2357
Max IOPS:             30959
Min IOPS:             23314
Average Latency(s):   0.000560402
Max latency(s):       0.109804
Min latency(s):       0.000212671

This is also quite far from expected. I have 12GB of memory on the OSD 
daemon for caching on each host - close to idle cluster - thus 50GB+ for 
caching with a working set of < 6GB .. this should - in this case not 
really be bound by the underlying SSD. But if it were:

IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off?

No measureable service time in iostat when running tests, thus I have 
come to the conclusion that it has to be either client side, the network 
path, or the OSD-daemon that deliveres the increasing latency / 
decreased IOPS.

Is there any suggestions on how to get more insigths in that?

Has anyone replicated close to the number Micron are reporting on NVMe?

Thanks a log.

[0]
https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com