On 2/7/19 8:41 AM, Brett Chancellor wrote: > This seems right. You are doing a single benchmark from a single client. > Your limiting factor will be the network latency. For most networks this > is between 0.2 and 0.3ms. if you're trying to test the potential of > your cluster, you'll need multiple workers and clients. > Indeed. To add to this, you will need fast (High clockspeed!) CPUs in order to get the latency down. The CPUs will need tuning as well like their power profiles and C-States. You won't get the 1:1 performance from the SSDs on your RBD block devices. Wido > On Thu, Feb 7, 2019, 2:17 AM <jesper@xxxxxxxx wrote: > > Hi List > > We are in the process of moving to the next usecase for our ceph cluster > (Bulk, cheap, slow, erasurecoded, cephfs) storage was the first - and > that works fine. > > We're currently on luminous / bluestore, if upgrading is deemed to > change what we're seeing then please let us know. > > We have 6 OSD hosts, each with a S4510 of 1TB with 1 SSD in each. > Connected > through a H700 MegaRaid Perc BBWC, EachDiskRaid0 - and scheduler set to > deadline, nomerges = 1, rotational = 0. > > Each disk "should" give approximately 36K IOPS random write and the > double > random read. > > Pool is setup with a 3x replicaiton. We would like a "scaleout" setup of > well performing SSD block devices - potentially to host databases and > things like that. I ready through this nice document [0], I know the > HW are radically different from mine, but I still think I'm in the > very low end of what 6 x S4510 should be capable of doing. > > Since it is IOPS i care about I have lowered block size to 4096 -- 4M > blocksize nicely saturates the NIC's in both directions. > > > $ sudo rados bench -p scbench -b 4096 10 write --no-cleanup > hints = 1 > Maintaining 16 concurrent writes of 4096 bytes to objects of size > 4096 for > up to 10 seconds or 0 objects > Object prefix: benchmark_data_torsk2_11207 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) > avg lat(s) > 0 0 0 0 0 0 - > 0 > 1 16 5857 5841 22.8155 22.8164 0.00238437 > 0.00273434 > 2 15 11768 11753 22.9533 23.0938 0.0028559 > 0.00271944 > 3 16 17264 17248 22.4564 21.4648 0.00246666 > 0.00278101 > 4 16 22857 22841 22.3037 21.8477 0.002716 > 0.00280023 > 5 16 28462 28446 22.2213 21.8945 0.00220186 > 0.002811 > 6 16 34216 34200 22.2635 22.4766 0.00234315 > 0.00280552 > 7 16 39616 39600 22.0962 21.0938 0.00290661 > 0.00282718 > 8 16 45510 45494 22.2118 23.0234 0.0033541 > 0.00281253 > 9 16 50995 50979 22.1243 21.4258 0.00267282 > 0.00282371 > 10 16 56745 56729 22.1577 22.4609 0.00252583 > 0.0028193 > Total time run: 10.002668 > Total writes made: 56745 > Write size: 4096 > Object size: 4096 > Bandwidth (MB/sec): 22.1601 > Stddev Bandwidth: 0.712297 > Max bandwidth (MB/sec): 23.0938 > Min bandwidth (MB/sec): 21.0938 > Average IOPS: 5672 > Stddev IOPS: 182 > Max IOPS: 5912 > Min IOPS: 5400 > Average Latency(s): 0.00281953 > Stddev Latency(s): 0.00190771 > Max latency(s): 0.0834767 > Min latency(s): 0.00120945 > > Min latency is fine -- but Max latency of 83ms ? > Average IOPS @ 5672 ? > > $ sudo rados bench -p scbench 10 rand > hints = 1 > sec Cur ops started finished avg MB/s cur MB/s last lat(s) > avg lat(s) > 0 0 0 0 0 0 - > 0 > 1 15 23329 23314 91.0537 91.0703 0.000349856 > 0.000679074 > 2 16 48555 48539 94.7884 98.5352 0.000499159 > 0.000652067 > 3 16 76193 76177 99.1747 107.961 0.000443877 > 0.000622775 > 4 15 103923 103908 101.459 108.324 0.000678589 > 0.000609182 > 5 15 132720 132705 103.663 112.488 0.000741734 > 0.000595998 > 6 15 161811 161796 105.323 113.637 0.000333166 > 0.000586323 > 7 15 190196 190181 106.115 110.879 0.000612227 > 0.000582014 > 8 15 221155 221140 107.966 120.934 0.000471219 > 0.000571944 > 9 16 251143 251127 108.984 117.137 0.000267528 > 0.000566659 > Total time run: 10.000640 > Total reads made: 282097 > Read size: 4096 > Object size: 4096 > Bandwidth (MB/sec): 110.187 > Average IOPS: 28207 > Stddev IOPS: 2357 > Max IOPS: 30959 > Min IOPS: 23314 > Average Latency(s): 0.000560402 > Max latency(s): 0.109804 > Min latency(s): 0.000212671 > > This is also quite far from expected. I have 12GB of memory on the OSD > daemon for caching on each host - close to idle cluster - thus 50GB+ for > caching with a working set of < 6GB .. this should - in this case > not really be bound by the underlying SSD. But if it were: > > IOPS/disk * num disks / replication => 95K * 6 / 3 => 190K or 6x off? > > No measureable service time in iostat when running tests, thus I have > come to the conclusion that it has to be either client side, the > network path, or the OSD-daemon that deliveres the increasing latency / > decreased IOPS. > > Is there any suggestions on how to get more insigths in that? > > Has anyone replicated close to the number Micron are reporting on NVMe? > > Thanks a log. > > [0] > https://www.micron.com/-/media/client/global/documents/products/other-documents/micron_9200_max_ceph_12,-d-,2,-d-,8_luminous_bluestore_reference_architecture.pdf?la=en > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com