I see, the second one is the read bench. Even in the 2 node scenario the read performance is pretty bad. Have you verified the hardware with micro benchmarks such as 'fio'? Also try to review storage controller settings.
On Apr 19, 2018 5:13 PM, "Steven Vacaroaia" <stef97@xxxxxxxxx> wrote:
replication size is always 2DB/WAL on HDD in this caseI tried with OSDs with WAL/DB on SSD - they exhibit the same symptoms ( cur MB/s 0 )In summary, it does not matter- which server ( any 2 will work better than any 3 or 4)- replication size ( it tried with size 2 and 3 )- location of WAL/DB ( on separate SSD or same HDD)ThanksStevenOn Thu, 19 Apr 2018 at 12:06, Hans van den Bogert <hansbogert@xxxxxxxxx> wrote:I take it that the first bench is with replication size 2, the second bench is with replication size 3? Same for the 4 node OSD scenario?Also please let us know how you setup block.db and Wal, are they on the SSD?On Thu, Apr 19, 2018, 14:40 Steven Vacaroaia <stef97@xxxxxxxxx> wrote:Sure ..thanks for your willingness to helpIdentical serversHardwareDELL R620, 6 cores, 64GB RAM, 2 x 10 GB ports,Enterprise HDD 600GB( Seagate ST600MM0006), Enterprise grade SSD 340GB (Toshiba PX05SMB040Y)All tests done with the following commandrados bench -p rbd 50 write --no-cleanup && rados bench -p rbd 50 seqceph osd pool ls detail"pool_name": "rbd","flags": 1,"flags_names": "hashpspool","type": 1,"size": 2,"min_size": 1,"crush_rule": 1,"object_hash": 2,"pg_num": 64,"pg_placement_num": 64,"crash_replay_interval": 0,"last_change": "354","last_force_op_resend": "0","last_force_op_resend_preluminous": "0","auid": 0,"snap_mode": "selfmanaged","snap_seq": 0,"snap_epoch": 0,"pool_snaps": [],"removed_snaps": "[]","quota_max_bytes": 0,"quota_max_objects": 0,"tiers": [],"tier_of": -1,"read_tier": -1,"write_tier": -1,"cache_mode": "none","target_max_bytes": 0,"target_max_objects": 0,"cache_target_dirty_ratio_micro": 400000,"cache_target_dirty_high_ratio_micro": 600000,"cache_target_full_ratio_micro": 800000,"cache_min_flush_age": 0,"cache_min_evict_age": 0,"erasure_code_profile": "","hit_set_params": {"type": "none"},"hit_set_period": 0,"hit_set_count": 0,"use_gmt_hitset": true,"min_read_recency_for_promote": 0,"min_write_recency_for_promote": 0,"hit_set_grade_decay_rate": 0,"hit_set_search_last_n": 0,"grade_table": [],"stripe_width": 0,"expected_num_objects": 0,"fast_read": false,"options": {},"application_metadata": {}}ceph osd crush rule dump[{"rule_id": 0,"rule_name": "replicated_rule","ruleset": 0,"type": 1,"min_size": 1,"max_size": 10,"steps": [{"op": "take","item": -1,"item_name": "default"},{"op": "chooseleaf_firstn","num": 0,"type": "host"},{"op": "emit"}]},{"rule_id": 1,"rule_name": "rbd","ruleset": 1,"type": 1,"min_size": 1,"max_size": 10,"steps": [{"op": "take","item": -9,"item_name": "sas"},{"op": "chooseleaf_firstn","num": 0,"type": "host"},{"op": "emit"}]}]2 servers, 2 OSDceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-9 4.00000 root sas-10 1.00000 host osd01-sas2 hdd 1.00000 osd.2 up 0 1.00000-11 1.00000 host osd02-sas3 hdd 1.00000 osd.3 up 0 1.00000-12 1.00000 host osd03-sas5 hdd 1.00000 osd.5 up 1.00000 1.00000-19 1.00000 host osd04-sas6 hdd 1.00000 osd.6 up 1.00000 1.000002018-04-19 09:19:01.266010 min lat: 0.0412473 max lat: 1.03227 avg lat: 0.331163sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)40 16 1941 1925 192.478 192 0.315461 0.33116341 16 1984 1968 191.978 172 0.262268 0.33152942 16 2032 2016 191.978 192 0.326608 0.33206143 16 2081 2065 192.071 196 0.345757 0.33238944 16 2123 2107 191.524 168 0.307759 0.33274545 16 2166 2150 191.09 172 0.318577 0.33361346 16 2214 2198 191.109 192 0.329559 0.33370347 16 2257 2241 190.702 172 0.423664 0.3342748 16 2305 2289 190.729 192 0.357342 0.33438649 16 2348 2332 190.346 172 0.30218 0.33473550 16 2396 2380 190.379 192 0.318226 0.334981Total time run: 50.281886Total writes made: 2397Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 190.685Stddev Bandwidth: 24.5781Max bandwidth (MB/sec): 340Min bandwidth (MB/sec): 164Average IOPS: 47Stddev IOPS: 6Max IOPS: 85Min IOPS: 41Average Latency(s): 0.335515Stddev Latency(s): 0.0867836Max latency(s): 1.03227Min latency(s): 0.04124732018-04-19 09:19:52.340092 min lat: 0.0209445 max lat: 14.9208 avg lat: 1.31352sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)40 16 296 280 27.9973 0 - 1.3135241 16 296 280 27.3144 0 - 1.3135242 16 296 280 26.664 0 - 1.3135243 16 323 307 28.5553 9 0.0429661 2.2026744 16 323 307 27.9063 0 - 2.2026745 16 363 347 30.8414 80 0.0922424 2.0597546 16 370 354 30.7795 28 0.0302223 2.0205547 16 370 354 30.1246 0 - 2.0205548 16 386 370 30.8303 32 2.72624 2.0640749 16 386 370 30.2011 0 - 2.0640750 16 400 384 30.7169 28 2.10543 2.0705551 16 401 385 30.1931 4 2.53183 2.0717552 16 401 385 29.6124 0 - 2.0717553 16 401 385 29.0537 0 - 2.0717554 16 401 385 28.5157 0 - 2.0717555 16 401 385 27.9972 0 - 2.0717556 16 401 385 27.4972 0 - 2.07175Total time run: 56.042520Total reads made: 401Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 28.6211Average IOPS: 7Stddev IOPS: 11Max IOPS: 47Min IOPS: 0Average Latency(s): 2.23525Max latency(s): 29.5553Min latency(s): 0.02094454 servers, 4 osdsceph osd treeID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF-9 4.00000 root sas-10 1.00000 host osd01-sas2 hdd 1.00000 osd.2 up 1.00000 1.00000-11 1.00000 host osd02-sas3 hdd 1.00000 osd.3 up 1.00000 1.00000-12 1.00000 host osd03-sas5 hdd 1.00000 osd.5 up 1.00000 1.00000-19 1.00000 host osd04-sas6 hdd 1.00000 osd.6 up 1.00000 1.000002018-04-19 09:35:43.558843 min lat: 0.0141657 max lat: 11.3013 avg lat: 1.25618sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)40 16 482 466 46.5956 0 - 1.2561841 16 488 472 46.0444 12 0.0175485 1.2518142 16 488 472 44.9481 0 - 1.2518143 16 488 472 43.9028 0 - 1.2518144 16 562 546 49.6316 98.6667 0.0150341 1.2638545 16 569 553 49.1508 28 0.0151556 1.2551646 16 569 553 48.0823 0 - 1.2551647 16 569 553 47.0593 0 - 1.2551648 16 569 553 46.0789 0 - 1.2551649 16 569 553 45.1386 0 - 1.2551650 16 569 553 44.2358 0 - 1.2551651 16 569 553 43.3684 0 - 1.25516Total time run: 51.724920Total writes made: 570Write size: 4194304Object size: 4194304Bandwidth (MB/sec): 44.0793Stddev Bandwidth: 55.3843Max bandwidth (MB/sec): 232Min bandwidth (MB/sec): 0Average IOPS: 11Stddev IOPS: 13Max IOPS: 58Min IOPS: 0Average Latency(s): 1.45175Stddev Latency(s): 2.9411Max latency(s): 11.3013Min latency(s): 0.01416572018-04-19 09:36:35.633624 min lat: 0.00804825 max lat: 10.2583 avg lat: 1.03388sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)40 16 479 463 46.2955 0 - 1.0338841 16 540 524 51.1169 24.4 0.00913275 1.2319342 16 540 524 49.8999 0 - 1.2319343 16 541 525 48.8324 2 2.31401 1.2339944 16 541 525 47.7226 0 - 1.2339945 16 541 525 46.6621 0 - 1.2339946 16 541 525 45.6477 0 - 1.2339947 16 541 525 44.6765 0 - 1.2339948 16 541 525 43.7458 0 - 1.2339949 16 541 525 42.853 0 - 1.2339950 16 541 525 41.996 0 - 1.2339951 16 541 525 41.1725 0 - 1.23399Total time run: 51.530655Total reads made: 542Read size: 4194304Object size: 4194304Bandwidth (MB/sec): 42.072Average IOPS: 10Stddev IOPS: 15Max IOPS: 62Min IOPS: 0Average Latency(s): 1.5204Max latency(s): 11.4841Min latency(s): 0.00627081Many thanksStevenOn Thu, 19 Apr 2018 at 08:42, Hans van den Bogert <hansbogert@xxxxxxxxx> wrote:Hi Steven,There is only one bench. Could you show multiple benches of the different scenarios you discussed? Also provide hardware details.HansOn Apr 19, 2018 13:11, "Steven Vacaroaia" <stef97@xxxxxxxxx> wrote:Hi,Any idea why 2 servers with one OSD each will provide better performance than 3 ?Servers are identicalPerformance is impacted irrespective if I used SSD for WAL/DB or notBasically, I am getting lots of cur MB/s zeroNetwork is separate 10 GB for public and privateI tested it with iperf and I am getting 9.3 GbsI have tried replication by 2 and 3 with same results ( much better for 2 servers than 3 )reinstalled CEPH multiple timesceph.conf very simple - no major customization ( see below)I am out of ideas - any hint will be TRULY appreciatedStevenauth_cluster_required = cephxauth_service_required = cephxauth_client_required = cephxpublic_network = 10.10.30.0/24cluster_network = 192.168.0.0/24osd_pool_default_size = 2osd_pool_default_min_size = 1 # Allow writing 1 copy in a degraded stateosd_crush_chooseleaf_type = 1[mon]mon_allow_pool_delete = truemon_osd_min_down_reporters = 1[osd]osd_mkfs_type = xfsosd_mount_options_xfs = "rw,noatime,nodiratime,attr2,logbufs=8,logbsize=256k,largeio,inode64,swalloc,allocsize=4M"osd_mkfs_options_xfs = "-f -i size=2048"bluestore_block_db_size = 32212254720bluestore_block_wal_size = 1073741824rados bench -p rbd 120 write --no-cleanup && rados bench -p rbd 120 seqhints = 1Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 120 seconds or 0 objectsObject prefix: benchmark_data_osd01_383626sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)0 0 0 0 0 0 - 01 16 57 41 163.991 164 0.197929 0.0655432 16 57 41 81.992 0 - 0.0655433 16 67 51 67.9936 20 0.0164632 0.2499394 16 67 51 50.9951 0 - 0.2499395 16 71 55 43.9958 8 0.0171439 0.3199736 16 181 165 109.989 440 0.0159057 0.5637467 16 182 166 94.8476 4 0.221421 0.5616848 16 182 166 82.9917 0 - 0.5616849 16 240 224 99.5458 116 0.0232989 0.63829210 16 264 248 99.1901 96 0.0222669 0.58333611 16 264 248 90.1729 0 - 0.58333612 16 285 269 89.6579 42 0.0165706 0.60060613 16 285 269 82.7611 0 - 0.60060614 16 310 294 83.9918 50 0.0254241 0.756351_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com