7 февраля 2019 г. 15:16:11 GMT+03:00, Ryan <rswagoner@xxxxxxxxx> пишет:
I just ran your test on a cluster with 5 hosts 2x Intel 6130, 12x 860 Evo 2TB SSD per host (6 per SAS3008), 2x bonded 10GB NIC, 2x Arista switches.Pool with 3x replicationrados bench -p scbench -b 4096 10 write --no-cleanuphints = 1Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for up to 10 seconds or 0 objectsObject prefix: benchmark_data_dc1-kube-01_3458991sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 5090 5074 19.7774 19.8203 0.00312568 0.003153522 16 10441 10425 20.3276 20.9023 0.00332591 0.003071053 16 15548 15532 20.201 19.9492 0.00337573 0.003091344 16 20906 20890 20.3826 20.9297 0.00282902 0.003064375 16 26107 26091 20.3686 20.3164 0.00269844 0.003066986 16 31246 31230 20.3187 20.0742 0.00339814 0.003074627 16 36372 36356 20.2753 20.0234 0.00286653 0.00308138 16 41470 41454 20.2293 19.9141 0.00272051 0.003088399 16 46815 46799 20.3011 20.8789 0.00284063 0.00307738Total time run: 10.0035Total writes made: 51918Write size: 4096Object size: 4096Bandwidth (MB/sec): 20.2734Stddev Bandwidth: 0.464082Max bandwidth (MB/sec): 20.9297Min bandwidth (MB/sec): 19.8203Average IOPS: 5189Stddev IOPS: 118Max IOPS: 5358Min IOPS: 5074Average Latency(s): 0.00308195Stddev Latency(s): 0.00142825Max latency(s): 0.0267947Min latency(s): 0.00217364rados bench -p scbench 10 randhints = 1sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 15 39691 39676 154.95 154.984 0.00027022 0.0003959932 16 83701 83685 163.416 171.91 0.000318949 0.0003753633 15 129218 129203 168.199 177.805 0.000300898 0.0003646474 15 173733 173718 169.617 173.887 0.000311723 0.000361565 15 216073 216058 168.769 165.391 0.000407594 0.0003633716 16 260381 260365 169.483 173.074 0.000323371 0.0003618297 15 306838 306823 171.193 181.477 0.000284247 0.0003581998 15 353675 353660 172.661 182.957 0.000338128 0.0003551399 15 399221 399206 173.243 177.914 0.000422527 0.00035393Total time run: 10.0003Total reads made: 446353Read size: 4096Object size: 4096Bandwidth (MB/sec): 174.351Average IOPS: 44633Stddev IOPS: 2220Max IOPS: 46837Min IOPS: 39676Average Latency(s): 0.000351679Max latency(s): 0.00530195Min latency(s): 0.000135292On Thu, Feb 7, 2019 at 2:17 AM <jesper@xxxxxxxx> wrote:Hi List
We are in the process of moving to the next usecase for our ceph cluster
(Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and
that works fine.
We're currently on luminous / bluestore, if upgrading is deemed to
change what we're seeing then please let us know.
We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each. Connected
through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and scheduler set to
deadline, nomerges = 1, rotational = 0.
Each disk "should" give approximately 36K IOPS random write and the double
random read.
Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of
well performing SSD block devices - potentially to host databases and
things like that. I ready through this nice document [0], I know the
HW are radically different from mine, but I still think I'm in the
very low end of what 6 x S4510 should be capable of doing.
Since it is IOPS i care about I have lowered block size to 4096 -- 4M
blocksize nicely saturates the NIC's in both directions.
$ sudo rados bench -p scbench -b 4096 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4096 bytes to objects of size 4096 for
up to 10 seconds or 0 objects
Object prefix: benchmark_data_torsk2_11207
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 5857 5841 22.8155 22.8164 0.00238437 0.00273434
2 15 11768 11753 22.9533 23.0938 0.0028559 0.00271944
3 16 17264 17248 22.4564 21.4648 0.00246666 0.00278101
4 16 22857 22841 22.3037 21.8477 0.002716 0.00280023
5 16 28462 28446 22.2213 21.8945 0.00220186 0.002811
6 16 34216 34200 22.2635 22.4766 0.00234315 0.00280552
7 16 39616 39600 22.0962 21.0938 0.00290661 0.00282718
8 16 45510 45494 22.2118 23.0234 0.0033541 0.00281253
9 16 50995 50979 22.1243 21.4258 0.00267282 0.00282371
10 16 56745 56729 22.1577 22.4609 0.00252583 0.0028193
Total time run: 10.002668
Total writes made: 56745
Write size: 4096
Object size: 4096
Bandwidth (MB/sec): 22.1601
Stddev Bandwidth: 0.712297
Max bandwidth (MB/sec): 23.0938
Min bandwidth (MB/sec): 21.0938
Average IOPS: 5672
Stddev IOPS: 182
Max IOPS: 5912
Min IOPS: 5400
Average Latency(s): 0.00281953
Stddev Latency(s): 0.00190771
Max latency(s): 0.0834767
Min latency(s): 0.00120945
Min latency is fine -- but Max latency of 83ms ?
Average IOPS @ 5672 ?
$ sudo rados bench -p scbench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 23329 23314 91.0537 91.0703 0.000349856 0.000679074
2 16 48555 48539 94.7884 98.5352 0.000499159 0.000652067
3 16 76193 76177 99.1747 107.961 0.000443877 0.000622775
4 15 103923 103908 101.459 108.324 0.000678589 0.000609182
5 15 132720 132705 103.663 112.488 0.000741734 0.000595998
6 15 161811 161796 105.323 113.637 0.000333166 0.000586323
7 15 190196 190181 106.115 110.879 0.000612227 0.000582014
8 15 221155 221140 107.966 120.934 0.000471219 0.000571944
9 16 251143 251127 108.984 117.137 0.000267528 0.000566659
Total time run: 10.000640
Total reads made: 282097
Read size: 4096
Object size: 4096
Bandwidth (MB/sec): 110.187
Average IOPS: 28207
Stddev IOPS: 2357
Max IOPS: 30959
Min IOPS: 23314
Average Latency(s): 0.000560402
Max latency(s): 0.109804
Min latency(s): 0.000212671
This is also quite far from expected. I have 12GB of memory on the OSD
daemon for caching on each host - close to idle cluster - thus 50GB+ for
caching with a working set of < 6GB .. this should - in this case
not really be bound by the underlying SSD. But if it were:
IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off?
No measureable service time in iostat when running tests, thus I have
come to the conclusion that it has to be either client side, the
network path, or the OSD-daemon that deliveres the increasing latency /
decreased IOPS.
Is there any suggestions on how to get more insigths in that?
Has anyone replicated close to the number Micron are reporting on NVMe?
Thanks a log.
[0]
https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
With best regards,
Vitaliy Filippov
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com